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A GENERAL ASYMPTOTIC SCHEME FOR INFERENCE 
UNDER ORDER RESTRICTIONS 



By D. Anevski and O. Hossjer 

Gotehorg University and Stockholm University 

Limit distributions for the greatest convex minorant and its deriva- 
tive are considered for a general class of stochastic processes includ- 
ing partial sum processes and empirical processes, for independent, 
weakly dependent and long range dependent data. The results are ap- 
plied to isotonic regression, isotonic regression after kernel smoothing, 
estimation of convex regression functions, and estimation of mono- 
tone and convex density functions. Various pointwise limit distribu- 
tions are obtained, and the rate of convergence depends on the self 
similarity properties and on the rate of convergence of the processes 
considered. 



1. Introduction. Let {xn}n>i be a sequence of stochastic processes de- 
fined on an interval J C M, and (a.s.) bounded from below on J. In this 
paper we consider the asymptotic behavior as n ^ oo of the greatest convex 
minorant of Xn, 

(1) Tj{xn) = sup{z; z : J R, z convex and z < Xn}, 
at an interior point to of J, as well as its derivative, 

/n\ rn I • Xn\U) — Xn\V) 

(2) i j(x„j (tj = maxmm ; 

v<t u>t u — V 

see Robertson, Wright and Dykstra [40]. Note that we use the convention 
Tj{x)'{t) = Tj{x)' {t+) for any process x. The Pool Adjacent Violators Al- 
gorithm (PAVA) used to calculate T can be found, for example, in [40]. 

The class of processes Xn we consider includes partial sum and empirical 
processes for independent, weakly dependent and long range dependent data. 
The estimators (1) and (2) have several important applications, for instance. 
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nonparametric regression and density estimation under order restrictions. 
The regression model has data satisfying 

yi = m{ti)+ei, 

where m is the unknown regression function, ti = i/n are equidistant and 
{si} are error terms. If we restrict m to be an increasing function, it is well 
known (cf. [10]) that the isotonic regression estimator 

(3) m = argmin< y^ivi — z{ti))^ : z increasing > 



is given by (2) at the observation points {ti}, with Xn the partial sum process 
formed by data. For independent and identically distributed errors {si}, 
the asymptotic properties of m have been derived in [11, 47] and [33]. For 
instance, it follows from [11] that 

(4) Cn'/^{m{to) - m{to)) ^ T{s' + B{s))'{0), 

as oo, where T = Tr, i? is a standard two-sided Brownian motion and 
C depends on m'ito) and cr^ = Var(ej). The right-hand side of (4) can also 
be replaced by 2 argminsg]R(s^ -|- B{s)), where we use the convention that, 
for any process x, argminsgig(j;(s)) means the infimum of all points at which 
the minimum is attained. 

In density estimation, data consists of a stationary process {tij^Li with 
an unknown marginal density function /. If / is increasing and supported on 
a (finite or half-infinite) interval J, the nonparametric maximum likelihood 
estimate (NPMLE) 

(5) / = argmax< TT z(tj) : z increasing, z > and / z{u)du = \\ 

for independent data can be written as / = Tj^Xn)' , where x„ is the empirical 
distribution; see [20]. Asymptotic properties of / have been obtained in [39] 
and [21]; see also [46]. In particular, (4) holds with /(to) and /(to) in place of 
m(to) and m{to), with C a constant depending on /'(to) and /(to). Note also 
that increasing density estimation is related to unimodal density estimation; 
see [7] and references therein. 

We propose to use Tj{xn)' as an estimator of m and of /, also for depen- 
dent data; for the regression problem, Tj{xny minimizes the sum of squares 
in (3) no matter what dependence structure we have for {£«}, while the like- 
lihood function is much more difficult to write down for dependent data; for 
density estimation the interpretation of Tj[xn)' is 

Tj{xn)' = argmin< ^(xj - z{ti)f'Wi : z increasing L 
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where Xi = [xniU) - Xn{ti^i)) / {U - tj-i) and Wi = ti- U-i. Thus, Tj{xn)' 
is the weighted I'^-projection of (xi, . . . ,Xn) on the convex set of increasing 
functions; see [40]. 

We review these results and show that the same hmits are attained if data 
are weakly dependent and mixing. For long range dependent subordinated 
Gaussian data, in the regression problem we obtain a result reminiscent of 
(4), but with a different (nonpolynomial) convergence rate and with B{-) re- 
placed with a process belonging to a class of long range dependent processes, 
which includes fractional Brownian motion and the Rosenblatt process; see 
[17]. In density estimation, B{-) is replaced by a straight line Z ■ s in (4) with 
Z ~ A^(0, 1), in the cases for which we are able to check all the conditions. 
But since T(s^ + Zs)'(O) = (s^ + Zs)'(O) = Z, f is asymptotically normal in 
this case. 

In [34] it is proposed, as an alternative to doing isotonic regression, to 
first smooth the data and then do isotonic regression, and the limit distri- 
bution is derived when using a kernel estimator with bandwidth h ~ n~^/^ 
as smoother. We review these results, as well as state results for mixing 
and long range dependent data; however, we treat all possible choices of 
bandwidths h. An analogous approach is possible for density estimation; 
we, however, refrain from stating these results since it will be clear from the 
regression arguments how to proceed. 

When estimating convex regression functions and density functions, the 
natural approaches would be to do convex regression or NPMLE of a convex 
density, respectively. An algorithm for convex regression has been proposed 
in [27], and a conjecture on the limit distribution can be found in [35]. In [30] 
an iterative algorithm for the NPMLE of a convex density and a conjecture 
on the limit distributions have been proposed; see also [2]. Finally, in [23, 24] 
the limit distributions for the convex regression and for the NPMLE of a 
convex density were derived. 

As an alternative we propose the estimator Tj{xn)/c{xn), where Xfi IS a 
kernel estimate of either m or /, and c{xn) = JjTj{xn){u) du/ Jj Xn{u) du. 
Thus, we obtain a convex function with the same integral over J as x„. The 
advantage over the regression and NMPLE approach is twofold: the PAVA 
algorithm used to calculate T is noniterative and always converges, and in 
this paper we state the limit distributions of Tj{xn), both for the regression 
problem and for the density estimation problem, for weakly dependent data 
and long range dependent subordinated Gaussian data. The interpretation 
of Tj{xn) is the following: If x'^ is piecewise continuous, then 

and, thus, Tj{xny is the L^-projection of x'^ on the convex set of monotone 
functions; Tj{xn) is the primitive function of the solution to (6). 
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Our general convergence results can be written as 

(7) d-P{Tj{xn){to) - xnito)) ^ T{\s\P + v{s)m, 
for the convex minorant of x„, and 

(8) d-P+\Tj{xnnto) - <„(to)) - T{\s\P + t)(s))'(0), 

for its derivative. Here 1 < p < oo is a fixed number, v a stochastic process 
reflecting the local behavior of Xn around to a-nd x^^n is the deterministic 
part of Xn, for example, E{xn)- The sequence dn i and p determine the rate 
of convergence in (7) and (8). Values ol p different from 2 have previously 
been considered by Wright [47] and Leurgans [33], and arise, for example, 
in nonparametric regression when 

m{t) - m{to) =asgn{t - to)\t - to\P~^ + o{\t - to\P~^), 

as t — > to for some constant a ^ 0. The rate at which (i„ j depends on 
the rate of convergence of x„ toward Xh^n and on the local self similarity 
properties of x„ around to- 

Prakasa Rao [39] was the first to establish limit distributions for T{xn)' 
(with Xn = Fn the empirical distribution function) and the approach pre- 
sented in that paper has served as a model for later authors, first considering 
least convex minorants along a sequence of decreasing "truncated" intervals 
around to and then establishing a truncation result saying that asymptot- 
ically the truncated intervals may replace J. Brunk [11] proved results for 
T{xn)' with Xn the partial sum process, using similar techniques and relying 
on Prakasa Rao's result for the truncation reasoning. Wright [47] extended 
Brunk's result to cover monotone densities satisfying other smoothness as- 
sumptions, using a slightly different approach for the truncation proof. The 
methods used in these papers rely heavily on the fact that data are indepen- 
dent, using martingale results, and also on the fact that the limit process is 
a Brownian motion. 

Leurgans [33] extended Wright's result to dependent data. The limit pro- 
cess is still assumed to be a Brownian motion, which could imply applications 
to weakly dependent data. However, the two applications given in [33] both 
deal with independent data (isotonic regression for independent and not 
identically distributed data, and isotonized quantile estimation for indepen- 
dent data). Next, Groeneboom [21] gave a different proof of Prakasa Rao's 
result, introducing strong approximation techniques (cf. [32]), and proved 
that the right-hand side of (3) is 2 argminsgiR(s^ + i?(s)); for the truncation 
result, a reference was made to Prakasa Rao's paper. Mammen [34] showed 
that a kernel estimate of m with bandwidth h ~ n~^/^ is first-order asymp- 
totically equivalent to the estimate obtained by doing isotonic regression on 
the kernel estimate, thus obtaining the limit distribution for the isotonized 
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kernel estimate. Wang [46] also used strong approximation to derive the 
limit distribution of the primitive function of the Grenander estimator. 

A first example of a more general asymptotic theory of derivatives of least 
convex minorants can be found in [33], which potentially covers weakly de- 
pendent data. In our paper we treat both the convex minorant (1) and its 
derivative (2), arbitrary (nonpolynomial) sequences dn I 0, as well as a large 
class of limit processes v{-), that is, not only Brownian motion. Thus, we 
are able to apply our general results also to estimation for dependent data 
(both short range and long range) and using estimates Xn other than the 
partial sum process or empirical process, such as, for example, kernel esti- 
mates. Our method of proof is similar to the classical proof of Prakasa Rao 
[39], based on first considering least convex minorants along a sequence of 
decreasing "truncated" intervals around ^i^d then establishing a trunca- 
tion result saying that asymptotically the truncated intervals may replace J. 
However, we decompose x„ into a sum of a deterministic convex function 
Xh^n and a stochastic part Vn- In this way we get very explicit regularity 
conditions that are possible to verify in a number of applications. Further, 
we use only weak convergence of a rescaled version of Vn and do not refer to 
strong approximations, thereby obtaining greater generality. This relies on 
the application of the continuous mapping theorem and, thus, the continuity 
of the map Tj :D{J) i— > C(J) is essential. Furthermore, we state conditions 
under which the continuous mapping theorem can be applied to the func- 
tional X ^ Tj{xy{tQ) (cf. Proposition 2). Such a condition automatically 
holds for Brownian motion and seems to have been implicitly assumed in 
previous work. 

The article is organized as follows: Section 2 establishes the main conver- 
gence results (7) and (8) in Theorems 1 and 2, respectively. These results 
are then applied in Sections 3 and 4 to regression and density function es- 
timation, respectively. In Section 5 a general formula is presented, which 
describes how dn depends on various properties of for example, local self 
similarity around to- In Section 6 we discuss possible extensions and gener- 
alizations. Finally, we have collected the proofs of the results in Section 2 
and some technical empirical process and partial sum process results in the 
Appendix. 

2. Limit distributions. Let J C M be a finite or infinite interval in M and 
define D{J) as the space of functions J i-^ M which are right continuous with 
left-hand limits. 

Assume {x„}n>i is a sequence of stochastic processes on D[J) for which 
we can write 



(9) 



Xn{t) = Xi,^n{t)+Vn{t), 
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where Xh,n is deterministic with Vn also a member of D[J). In this section 
we will derive limit distributions of Tj{xn) and Tj(x„)' for a large class of 
stochastic processes Xn- Our main assumptions on Xn are that the process 
part Vn can be rescaled in a way close to the self similarity property, and 
that the rescaled process converges weakly to some limit process. Given 
a sequence (i„ | 0, we rescale Vn locally around an interior point to of J 
according to 

Vn{s]tQ) = d~P{Vnito + sdn) - Vn{to)), 

where 1 < p < oo is a fixed constant and s G Jn.,to = d~^{J — to)- Thus, 
Vn{-;tQ) G D{Jn,to)- 

Many of the results on weak convergence are stated as results in I?[0, 1] 
equipped with the Skorokhod metric. There are two reasons why this will 
not be appropriate for our needs. The first is that processes treated in our 
applications are not random elements of -D[0,1]. For instance, Vn{s]tQ) is 
defined on D{Jn.to) = -D[— a„, where a„,, fe„, ^ co as n — > oo. The second 
reason is that the Skorokhod metric is too weak for the application we have 
in mind: the greatest convex minorant function T:Z)[0, 1] i— > C[0, 1] will not 
be continuous if D[0, 1] is equipped with the Skorokhod topology. Thus, we 
would not be able to use the continuous mapping theorem to show limit 
distribution results for T applied to a drift term plus a rescaled process. 

The first problem is solved by working in D(— oo,oo). For instance, Vn{s) 
can be extrapolated according to 

Vn{-an]to), ifs<-a„. 



Vn{s;to) 



Thus, Vn{s;to) will lie in D{—oo,oo) for all n. To deal with the second 
problem, we define a metric on D{J) as follows: for x,y £ D{J), 

(10) p(x,y) = f;2-^ 



k=l 



where pk{x, y) = sup^g[„;, ^^j^j |x(s) — ?/(s)|, that is, we write Xn—^x'm D{J) 
if, for each fixed k, sup[_;j ^jj^j \xn{s) — x{s) \ — > 0. Note that if | J| < oo, then 
p is equivalent to pj{x,y) = sup^gj |x(s) — y{s)\. By Theorem 23 in [38], 
page 108, weak convergence in D{—oo, oo) is equivalent to weak convergence 
in D[—k,k] of the processes restricted to [—k,k], for every fixed k, where 
each D[—k,k] of course is equipped with the sup- norm metric over [—k,k]. 
Note that with this metric the empirical process is not a measurable map 
if we use the Borel cr-algebra on D[—k,k]. If we instead use the cr-algebra 
generated by the open balls, the empirical process becomes measurable, and 
that assumption is also made in [38]. In that case, however, the continuous 
mapping theorem becomes somewhat more complicated, in that the set on 
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which the function has ah its continuity points should satisfy a certain reg- 
ularity condition, as well as the usual demand that it have probability mass 
one. In the case of the functional x i— > T{x){t) that is not a problem, since 
this map is continuous everywhere; see (76) in Lemma A.l in the sequel. 
However, in the case of the functional x i-^ T{x)'{t), it does pose a potential 
problem; see the proof of Proposition 2 and Note 2 in the sequel. 

The next two assumptions are related to a local limit distribution result; 
see Lemma A. 2 in Appendix A and the proof of Theorem 2. 

Assumption A1 (Weak convergence of rescaled stochastic term). As- 
sume there exists a stochastic process v{-;tQ) / such that 

on D(—oo, oo) as n ^ oo. 

Assumption A2 (Bias term). Assume the functions {xb,n}n>i ai'e con- 
vex. Put 

gn{s) = d~P{xb,n{to + Sdn) - ln{s)), 

(11) 

ln{s) = Xb^nito) + Xfc „(to)s'in, 

for s G Jn,to- Assume there is a constant A> such that for each c > 0, 
(12) snp\gn{s)-A\s\P\^0, 

\s\<c 

as n ^ oo. 



In applications we typically have a convex function Xb, such that either 
Xb,n = Xb or Xb.n Xb sls n ^ OO, Satisfying 

Xb{t) = Xb{to) + x'Mit - to) + A\t - tor + o{\t - to\P), 

as t ^ to- In particular, A = ^x'^{t()) if p = 2. 
Define the rescaled function 

(13) Vnis) = gnis) +Vn{s;to). 

The next two assumptions are related to a truncation result; see Lemma A. 3 
and Theorem A.l in Appendix A. 



Assumption A3 (Lower bound). For every 5 > 0, there are finite < 
r = t{6) and < n = k{6) such that 



liminfP( inf (?/„(s) - k|s|) > ) > 1 - (5. 

|s|>T 
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Assumption A4 (Small downdippings). Given e,5,f> 0, 
limsuppf .inf ^ - inf ^ >e\<5, 



limsupPf inf 



T<S<C s f<s s 



inf y^<-e]<5, 

s<~f S 



-c<s<~f S 

for all large enough c > 0. 



We will now present a slightly less general but more transparent version 
of Assumptions A3 and A4, since in many of the applications it is possible 
to establish a separate restriction on the process part of yn- 

Proposition 1. Suppose Assumption A2 holds and that, for each e,d > 
0, there is a finite r = r(e, 5) such that 



(14) 



lim sup P [ sup 

, |s|>r 



Vnis) 



9n{s) 



> s] <6. 



Then Assumptions A3 and A4 hold. 



Also the following assumption is related to the truncation results Lemma A. 3 
and Theorem A.l in Appendix A. 

Assumption A5 (Tail behavior of limit process). For each e,6 > 0, there 
is a r = r(e, (5) > so that 

v{s;to) 



P{ sup 

,|s|>r 



A\s\P 



>e]<6. 



Theorem 1. Let to be fixed and suppose Assumptions Al, A2, A3, 
A4 and A5 hold. Then 

(15) d;,P[Tj{xn){to) - xnito)] ^ T[A\s\P + v{s;to)m, 

with yl > as in Assumption A2, as n — > cx). 

Proof. Denote = T[„c,c] ,T = Tu and T^n = T[t^^_ca„,to+cd„] ■ Clearly, 
d-P(rj(x„)(to) - = d-P{Tj{xn){to) - Tc,„(x„)(to)) 

+ d~'P{Tc^n{Xn){to) - Xnito)). 

The truncation result in Lemma A. 3 implies 



d-P{T,^n{Xn){to) - Tj{Xn){to)) ^ 
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if we first let n — > oo and then let c — > oo. The local limit distribution result 
of Lemma A. 2 implies that 

as n — > oo, where 

(16) y{s) = A\s\P + v{s;to). 

Then use Theorem A.l, Proposition 1 and Assumption A5 with yn(s) = y{s) 
to deduce 

TMs)m-T{y{s)m^O 
as oo. An application of Slutsky's theorem completes the proof. □ 

Next we will study the limit distribution of the derivative T{xn)' ■ There 
are some extra difficulties in this case. One is that the processes Xn need 
not be differentiable. We therefore study the difference between T{xn)' and 
^'b,n directly. 

Since the functional 

(17) h:D[-c,c]3x^T{x)'{<d) 
is not continuous, the next assumption is essential. 

Assumption A6. Suppose yn-,y are defined in (13) and (16). Then 

Tc(y„)'(0)^r,(y)'(0), 

as n ^ oo, for each c > 0. 

We need some simple condition in order to check Assumption A6. 

Proposition 2. Assume y takes its values in a separable set of com- 
pletely regular points {cf. [38]), with probability one. Suppose Assumptions 
Al and A2 hold and that for each a e M and c, e > 0, 

(18) P(y(s)-y(0)- as >e\s\ for all s e[-c,c]) = Q. 
Then Assumption A6 holds. 

Note 1. Since y{s) =v{s]tQ) + A\s\p and (^|s|p)'(0) = 0, (18) follows if 
we can prove 

(19) P({i(-;to)Gf^c(a,e)) = 0, 
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for each a G M and c,e > 0, with r2c(a, e) defined in the proof of Proposition 2 
in Appendix A. But (19) follows if we can find a random variable Z (which 
may be deterministic) such that 

(20) pfliminf^^^iM^<0 

(21) p(liminf^^^lM^<0)=l. 

Note that (20) and (21) hold if v{s;to) is differentiable at [take Z = 
v'{0;tQ)]. We can also make use of (the lower half of) the iterated loga- 
rithm law. Thus, with Z = 0, (20) and (21) follow if we can find a function 
■0 : M \ {0} (0, do) such that 

pfliminf^ = -lUl, 
pfliminf%^ = -lUl. 

V s^0~ V(s) / 

Note 2. If y is continuous almost surely, the separability and complete 
regularity assumptions in Proposition 2 are satisfied; see Chapters 4 and 5 
of [38]. All limit processes in this paper are almost surely continuous. 

Theorem 2. Assume that Assumptions A1-A6 hold. Then 

d-P+' [TixnYito) - xi^M] ^ T{A\s\P + v{s; to))'(0) 
as n — > oo . Further, if 

(22) P{T{A\s\P + v{s; to))'(0) = a) = 0, 
then 

lmi^P{d-P+^[Tixny{to) - x^^ito)] < a} 

= P< argmin(^|s|^ + v{s; to) — as) > >, 

with A as in Assumption A2. 

Proof. We start by proving a local limit distribution result. A t varying 
in In = [to — cdn,to + cdn] can be written as t = to + sdn with s S [— c, c] . Then 

(23) Xn{to + sdn) = Vnito) + ln{s) +d^(5n(s) +Vn{s]to)), 

with gnJn defined in Assumption A2. We use the representation (23) and 
the chain rule to obtain 

T,,nixny{to) = xln{to)+dP-'TMs) + Vn{s;to))'{0). 
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(24) d-P+\T,,M'{to) - < Jto)) = Te(y„)'(0) ^ r,(2/)'(0) 

as n — > oo, with y„,,y defined in (13) and (16). Applying Lemma A. 3 with 
A = 0, we obtain 

(25) lim limsupP(d-f+i|Tc,n(xn)'(to) - r(x„)'(io)| >e) = 0. 

Then, applying Theorem A.l with ?/n(s) = y{s) and / = {0}, we get 

(26) limP{\T,{y{s)nO) - T(y(s))'(0)| >e) = 0. 

Now (24), (25) and (26) and Slutsky's theorem prove the first part of the 
theorem. 

For the second part of the theorem, we notice that if P(T(y)'(0) = a) = 0, 
then 

lmi^P{d-P+\T{xn)\to)-xlM)<a) = P{T{yy{0)<a). 

Since T(x)' is defined as the right-hand derivative of T{x) and argminsgiK(x(s)) 
as the infimum of all points at which the minimum is attained, it follows 
that 

(27) {TivYiO) <a} = (argmin(y(s) - as) > oi. 

By the first half of the theorem, 

P{d-P+\T{xrry{to) - <„(to)) < «) - PinvnO) < a), 
if Assumption (22) holds, and this concludes the proof. □ 

In our applications the limit process will have stationary increments. Fur- 
thermore, it will be a two-sided version of a process defined on M^, and as 
such, its distribution will be unaffected by reflections in the y axis through 
the origin. In these cases our results simplify. 

Assumption A7 (Stationarity). The process v{-;to) has stationary in- 
crements, and 

{v{si;to),. . .,v{sk;to)) = {v{-si;to), . . . ,v{-Sk;to)), 
for each k and all si, . . . ,Sk- 

Corollary 1. Suppose Assumptions A1-A7 hold andp = 2. Then 

d~^{T{xn)'{h) - 2;'b,„(io)) ^ 2 arg min l^s^ + v ) 

as n ^ oo, with A as defined in Assumption A2. 
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Proof. We need to show that 

hm P{d~\T{xnyito) - <„(to)) < a) 



self 



P ( arg min { s'^ + v{ — ^ ; io ) ) < 



at each a satisfying 



(28) P(argmin(s2 + ^(_;tojj =_^j =0. 



s 



Note that 



P arg min s +v \ ; to < 



s w a 



(29) =P( argmin(s2 + i)( 4= + -^;*0 ) ) > ^) 

= P[ argmin(y(s) — as) > 



where the first equahty follows by Assumption A7, and the second by a 
change of variables and completion of squares. Putting ha{y) = l{argmm(y{s)-as 
we can rewrite (28) as 

(30) lim Eha+s{y) = Eha{y). 

Note also that ha+e{y) T when e J, 0. Let D = {z : limg^^o ^a+e(-2) / ha{z)}. 
Then if P{D) = 0, we have 

Eha+e{y) = Eha+e{y)^D<^}{y) T ^^a(y)l{D<=}(y) = Eha{y) 

as e I by monotone convergence, and, thus, (30) holds. But (27) implies 
ha+e{z) = l ^ T{z)'{Q)<a + e. 

Thus 

D = {z:T{z)'{Q)=a], 
and the latter part of Theorem 2 completes the proof. □ 

Let S{xn) denote the least concave majorant of x„. Limit distribution 
results for S{xn) and S{x n)' now follow easily by noting that Si^x^) — 

-T{-Xn). 

In the next two sections we will consider various applications of Theorems 
1 and 2 when p = 2. Applications for other j» > 1 in the independent data 
case are treated in [47] and [33]. 
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3. Regression. Assume m is a function on the interval J = [0, 1] C M, 
and (yj, tj), i = 1, . . . , n, are pairs of data satisfying 



where the tj = i/n are the design points, that is, we have an equispaced 
design. For later convenience, we define the error terms £{ for ah integers, 
and assume that {£i}'^_^ form a stationary sequence of random variables 
with E{ei) = and Var(ej) = cr^ < oo. Let = Var(X)"=i Then the two- 
sided partial sum process Wn is defined by 



and linearly interpolated between these points. This process is right contin- 
uous with left-hand limits, so it lies in the space D(— oo,oo). 

The dependence structure for the random parts, the e^, will determine the 
limit distribution. Let Cov(A;) = E{E\£iJ^k) denote the covariance function. 
Then it is possible to distinguish between three cases [of which (i) is a special 
case of (ii)]: 

(i) Independence: the Ei are independent. 

(ii) Weak dependence: I Cov(A:)| < oo. 

(iii) Strong (long range) dependence: I Cov(A;)| = oo. 

The first two cases are similar in the sense that, for these, Wn has the same 
limit distribution, namely, the Brownian motion. For the case of long range 
dependence, the limit distributions are very different. Also, this case is the 
most awkward to work with, and limit distribution results are known only 
for subordinated processes, that is, when £i is a function of an underlying 
process with a parametric law. We will treat only subordinated Gaussian 
processes when the underlying process is Gaussian. All results stated will be 
for processes in D{—oo, oo) with the uniform metric on compacta defined in 
(10), and the cr-algebra generated by the open balls. 

Most of the limit results stated for partial sum processes are results for 
processes in D[0, 1] equipped with the Skorokhod metric. An examination 
of the proofs of the limit distribution results for D[0, 1] shows that there is 
nothing special about [0, 1]; it can be replaced by [0, k], for any finite k. This 
means that the results can be seen as results for D[0, k], with the Skorokhod 
metric. If the limit process is in C[0,A;] a.s., we can use the Skorokhod- 
Dudley theorem to get new random processes converging almost surely, so in 
the Skorokhod metric on a set with probability one. But convergence in that 



(31) 



yi = m{ti)+ei 




i = 0,1,2,... 
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metric toward a continuous function implies convergence in the supnorm- 
metric, and this imphes weak convergence in Z)[0, fc] with the supnorm- 
topology. Finally, this is made into a result for D[—k,k], for the two-sided 
partial sum process Wn- 

When the £i are independent, we have the classical Donsker theorem (cf. 
[8]), implying that 

(32) Wn ^ B, 

as n — > oo, with B a two-sided standard Brownian motion on D(— oo,oo). 

Next we treat weakly dependent data. The notion of weak dependence can 
be formalized in several ways. We will use mixing conditions; for a survey 
see [9]. Define the cx-algebras 

= cr{ei:i< k}, 
= cr{ei:i > k}, 

where a{ei :i G /} denotes the a-algebra generated by {ei'.i £ I}. 

Definition 1. The stationary sequence {£{} is said to be (;^-mixing or 
a-mixing, respectively, if there is a function (j){n) or a(n) — > as n — > oo, 
such that 

sup \P{A\To)-P{A)\<ct){n), 
sup \P{AB)-P{A)P{B)\<a{n). 

Mixing conditions say that elements in sequences are almost independent 
if they are far away from each other. There are other ways to model weak 
dependence, such as the notion of mixingales introduced in [36], which is a 
special case of the processes treated in [26]. See also the results for short 
range dependent subordinated Gaussian sequences in [15]. 

Introduce 

oo 

(33) K2 = Cov(0) + 2^Cov(fc) 

k=l 

whenever the limit exists. The following results for mixing sequences are 
adapted from [37] and [26]. 

Assumption A8 ((/>-mixing). Assume {ej}jgz is a stationary (/>-mixing 
sequence with Esi = and Eef < oo. Assume further J^k^i (t>{kY^'^ < oo and 
K>0 in (33). 
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Note that k exists and that 

2 

(34) ^ ^ 

n 

as n ^ oo by Lemmas 20.1 and 20.3 in [8], if Assumption A8 is satisfied. 
In [26] it is shown that Donsker's result (32) is impHed by Assumption AS 
and also by several other combinations of assumptions. 

Assumption A9 (a-mixing). Assume {£i}i£z is a stationary a-mixing 
sequence with Esi = and Eef < oo, k > in (33) and Y.k^i a{k)^^'^~^ < oo, 
for some e > 0. 

From Lemma 20.1 in [8] and Theorem 17.2.2 in [29] it follows that 
exists and that (34) holds, if Assumption A9 is satisfied. The results of 
Peligrad [37] imply that if Assumption A9 holds, then Donsker's result (32) 
follows. 

To treat long range dependent data, assume {£,i}iez is a stationary Gaus- 
sian process with mean zero and covariance function Cov(A;) = £'(^j^j+fc) 
such that Cov(O) = 1 and CoY{k) = k^'^lQ^k), where /q is a function slowly 
varying at infinity and < d < 1 is fixed. For a review of long range depen- 
dence, see [6]. 

Let 5 : M I— > M be a measurable function and define £i = g{£,i)- Then we can 
expand g{S,i) in Hermite polynomials 

oo 2 

5'(6) = Xl^^fc^fc(^i)' 

k=r 

with equality holding as a limit in ((/>), with cj) the standard Gaussian den- 
sity function. Here are the Hermite polynomials of order k, the functions 

Vk = E{g{ii)hk{ii)) = j g{u)hk{u)ct){u) du, 

are the L^((/))-projections on hk, and r is the index of the first nonzero 
coefficient in the expansion. Assuming that {) < dr < 1, the sequence {si} 
also exhibits long range dependence. In this case we say the sequence {si} 
is subordinated Gaussian long range dependent with parameters d and r. 
The results of Taqqu [43, 44] show that 

f^n ^ aid) Zr^isit) 
i<nt 

in Z?[0, 1] equipped with the Skorokhod topology. Lemma 3.1 and 
Theorem 3.1 in [43] show that the variance is cj^ = Var(X]"=i ^(Ci)) = 
r/^n^~'"'^/i(n)(l + o(l)), where 

(35) hik) = — 1- -/o(A;)^ 

r!(l — rd){2 — rd) 
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The hmit process Zr^i3 is in C[0, 1] a.s., and is self similar with parameter 

(36) P=l-rd/2. 

That is, the processes Zr^f3{5t) and S^Zr^i3{t) have the same finite-dimensional 
distributions for all (5 > 0. 

The limit process can, for arbitrary r, be represented by Wiener-Ito- 
Dobrushin integrals as in [17]; see also the representation given in [44]. The 
process zi^f^{t) is fractional Brownian motion, Z2^fj{t) is the Rosenblatt pro- 
cess, and the processes Zr^/3{t) are all non-Gaussian for r > 2; see Taqqu [43]. 
This implies that, under the above assumptions, 

(37) Wn Br^p 

in Z)(— oo,oo), as n— >oo, where B^^p are the two-sided versions of the pro- 
cesses Zr,/3- 

3.1. Isotonic regression. Assume the regression function m in (31) satis- 
fies m G JF = {increasing functions}. The problem of minimizing the sum of 
squares Yll=i{yi~^{^i))'^ o'^^r the class J- is known as the isotonic regression 
problem. The nonparametric least squares estimator is obtained as 

(cf., e.g., [40]), where Xn is defined as follows: Let h = n{t) = \ nt — 1/2J and 
put 

Xn{t) = n-^y^yi + ^ '-^ Vn+u tG[0,l]. 

1=1 

The limit distribution of T[o^i](x.„)' is known in the case of independent data 
and is included in Theorem 3 in [35]; note also the results in [33] and [47]. 

Actually r[o^i](xn)' is the solution to the isotonic regression problem, no 
matter what the dependence structure is, and we will derive the limit dis- 
tributions also in the weakly and long range dependent cases. 

We can write 

Xn{t) =Xb,n{t) +Vn{t), 

with 

f^f — I /2) — fi 

Xb,n{t) = ^ m{ti) H m(^^^+l), 

i=i " 

_i (nt -1/2) -n 

Vn{t)=n > ejH Efi+i- 

^ n 
1=1 
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Note that Xb,n is convex (recall that we assume p = 2 in Sections 3 and 4) 
and that, because of the stationarity of {ei}°^_^, 

Vnis;t) = d'"^ {Vn{t + Sdn) - Vn{t)) 

= d~'^n~^an{wn{td~^ + s) - Wn{td~^)) 
= d~'^n~^afiWn{s), 

where n = ndn and the last equality in distribution holds exactly when t = 
ti for any i and asymptotically for all t. Since we know that, under the 
appropriate assumptions, wa ^ w in D[—oo,oo) for some process w, we 
need to choose dn in such a way that d~'^n~^afi — > c for some constant 
< c < oo. Thus, 

in D{—oo, oo). 

Theorem 3. Assume m is increasing with m'(to) > and tQ £ (0,1). 
Let m(to) = T[Q ;^](3;„)'(to) be the solution to the isotonic regression problem. 
Suppose that one of the following conditions holds: 

(i) {si} are independent and identically distributed with Esi = and 
Var(ej) = < oo; 

(ii) Assumption A8 or A9 holds, a'^ = Yar{J27=i^i) '^'^^ define as 
in (33); 

(iii) Ei = g{£,i) is a long range dependent subordinated Gaussian sequence 
with parameters d and r, and (5 as in (36). 

Then, correspondingly, we obtain 

d~^ci{to){m(to) - m{to)) argmin(s^ + v{s)), 

s€R 

d-^C2{to) (^J^" m{s) ds - xnito)^ ^ T{s^ + 7)(s))(0), 
as oo with, respectively: 

(i) y = B,dn = n-l/3,Ci(to) = 2-2/3^/ (^^)-l/3^-2/3^ ^^(^^^^ ^ 2~^''^ X 

m'(^o)'/V-'/^• 

(ii) V = B,dn = 71-1/3, Cl (to) = 2-2/3,n'(to)-l/=^K-2/3, C2(to) = 2-^3 X 

m'(^o)'/^K-^/^■ 

(iii) v = Brp,dn = /2(n)n"^'^/(2+^'^), Ci(to) = 2-l/(2-/3)j„'(iQ)(/3-l)/{2-/3) ^ 
\Vr\-^'^^-^\ C2(to) =2-^/(2-/3)^/(^^)/3/(2-/3)|^^|-2/{2-/3) . 
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and I2 is a slowly varying function related to li as shown in the proof below. 
Moreover, 

(38) nfj- / {m{s) — m{s)) ds ^ vu{l) 

Jo 

as 00, where w{t) = B[t) in the cases (i) and (ii) and w{t) = Bj. p[t) in 
the case (iii), and h = [nt — 1/2J . 

Proof, (i) (The independent case) We have o"? = a'^h = a'^ndn, which 
imphes that we can choose (i„ = n~^/^, so that c = d~^n~^an = <t. The 
rescaled process is yn(s) = gn{s) +Vn{s;t), where Vn,gn are defined in Sec- 

tion 2. From Donsker's theorem (32), it follows that Vn{s;to) -^aB{s) as 
n — > 00 on Z)(— 00,00) and, thus. Assumption Al is satisfied. Next de- 
fine rhn{t) = m{ti) when ti — l/(2n) < t < ti + l/(2n), so that Xb,n{t) = 
jQ-rhniu) du. Then 

rto+sdn 

gn{s)=d~ I {mn{u) - rhnito)) du 

Jto 

rto+sdn 

= d~ / {m{u) -m{to))du + rn{s), 

Jto 

where the first term converges toward As'^ uniformly for s on compacta, 
with A = m'(to)/2, and 

sup I r„(s) I < 2c(i~^ sup \m{u) — mn{u)\ = 0{n~^d~^) = o{l), 

\s\<C \u — to\<sdn 

since n~^d~^ = cdncr^^il + o(l)) — > 0, because dn ^ and an ^ 00 and, 
thus, Assumption A2 holds. Assumptions A3 and A4 follow by Proposi- 
tion 1 and Lemma B.l in Appendix B. Assumptions A5, A6 and A7 hold 
by properties of the Brownian motion; see [42] for an LIL for Brownian mo- 
tion which shows Assumption A6 via Proposition 2 and Note 1. Thus, from 
Theorem 1, 

n2/3 m{s) ds - Xn(to)) ^ T{As^ + aB{s)){0) 

= ^-i/V/3r(s2 + S(s))(0) 

as n — > 00, where the equality follows from the self similarity of Brownian 
motion. Furthermore, Corollary 1 implies 

n^/^ (m(to) - m(to)) ^ 2A^/^ arg min f + ( ^ 
= 2A^/V2/^ arg min(s2 + B{s)) 
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as n — > oo, where the equahty follows from the self similarity of Brownian 
motion, and the proof is complete. 

(ii) (The mixing case) Choosing n = ndn, we get, as in the independent 
data case, d~'^n~^an k, so that v{s;to) = kB{s), with the choice d„ = 
j^-i/3_ rjij^g pgg|- q£ ^YiQ proof goes through as for independent data. 

(iii) (The long range dependent case) In this case o"? = r]f.{ndn)'^~^'^li{h) . 
We choose dn as 

\r]r\ = d~\~^an = d~^n~^\r]r\{ndny~"'/\{ndn)^/^ 
^ 4+^-^/2 =n-^'^/2/i(n(i„)V2 

^ d.=n-^'^/(2+'-^)/2(n), 
where I2 is another function slowly varying at infinity. Thus, 

on D(— 00,00), as n (and n) — > oo, and Assumption Al holds. Assump- 
tion A2 is proved as for independent data and Assumptions A3 and A4 fol- 
low from Proposition 1 and Lemma B.l. Also Assumptions A5, A6 and A7 
follow from the properties of Bj-j^; see Proposition 2 for Assumption A6. 
The assumptions of Theorem 1 are therefore satisfied and 

mis) ds - Xnito)^ ^ T{AS^ + \l]r\BrAsW) 
g ^-/3/(2-/3)|^^|2/(2-/3)^(^2 

as n ^ oo, where the equality follows from the self similarity of -Br,/3- Fur- 
thermore, Corollary 1 implies 

d-^(m(io) - m(to)) ^ 2^1^/2 argminf + \r^^.\B^ J^\\ 

sgk V \\/AJJ 

g 2^(i-^)/(2-/3)|r?,|V(2-/5)argmin(s2 + i?,,^(s)), 

where the equality follows from the self similarity of -Br-,/3- 
To show (38), note that, with M{i) = jQm{s)ds, 

Xn{to) - M{to) = Vn{to) + (xfe,„(to) " M{to)) 

= n~'^anWn{l) + Op{n~^), 

implying that 

na^\xnito)-M{to))^w{l) 
as n — > oo. This proves the theorem. □ 
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3.2. Estimating a convex regression function. Assume the regression func- 
tion m in (31) belongs to the class J-2 = {convex functions}. One natural 
way to estimate m based on the data is to do convex regression, that is, to 
minimize the sum of squares over the class of convex functions. Algorithms 
for the convex regression problem are given in [27] and [35], and the limit 
distribution for independent data is presented in [23, 24]. We present here an 
estimator of a convex regression function for which we are able to give the 
limit distributions also in the weakly dependent and long range dependent 
cases. 

Define y„ : [l/n, 1] M by linear interpolation of the points {(ti,yi)}r=i) 
and let 

(39) xn{t) = J k{{t - u)/h)yn{u) du 

be the Gasser-Miiller kernel estimate of m{t) (see [19]), where /c is a sym- 
metric density in L^(R) with compact support; for simplicity, take supp(/i;) = 
[—1,1]; k is called the kernel function. Let h be the bandwidth, for which 
we assume that h — > 0,n/i — > oo. The exact choice of h will be affected by 
the dependence structure of {£«}. 

To define a convex estimator of m, we put 

(40) ™(,, = ZM(fi)W, 

c{Xn) 

where c{xn) = J jT^Qi]{xn){t) dt{Jj Xn{t) dt)~^ is a normalization constant 
that ensures Jjrh{t)dt = JjXn{t)dt. We will confine ourselves to studying 
the asymptotics of T^q^^ (xn), that is, the behavior of m before normalization. 
Kernel regression estimation for long range dependent errors is considered 
in [13, 14]. 

Clearly, x„(t) = X6^„,(t) + Vn{t), with 



Xh,n{t) = h ^ j k(^—j^mn{u) du, 

Vn{t) = J k (~~f~^ ^n{u) du, 



where the functions m„ and e„ are obtained by linear interpolation of 
{(tj, m(tj))}f^]^ and {{ti,ei)}^^i, respectively. For the deterministic term, 
Xb,n{t) Xb(t) = m(t), as n ^ oo. Note that m„, and thus also is con- 
vex. 
Put 

(41) Wn{t) = / £n{u)du. 

O'n Jo 
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Since supp(A;) = [—1, 1] and if t G (l/n + h,l — h), from a partial integration 
and change of variable, we obtain 

Vn(t) = —J- / k' {u)iVn(t — uh) du. 
nh J 

It can be shown that iD„, and Wn are asymptotically equivalent for all de- 
pendence structures treated in this paper. This will henceforth be tacitly 
assumed. 

Recall that for the rescaling of Vn we need to choose (i„ in a correct 
way. Having done that choice, depending on the relation between the rate 
of convergence to zero of the bandwidth and of we get different limit 
results for r(x„). We have three subcases: d„ = h,dn/h — > 0, or dn/h — > oo 
as n — > oo. 

3.2.1. The case dn = h. For s > 0, we rescale as 
Vn{s;t) = d~^{nh)~^an j {Wn{h~'^t + s - u) 

— Wn{h~^t — u))k'{u) du 

= d~'^{nh)~^afi / {wn{s — u) — Wn{—u))k'{u) du, 



with h = nh, where the last equality holds exactly only for t = ti and asymp- 
totically otherwise. Note that the right-hand side holds also for s < 0. 
Assume dn = h is such that 

(42) d~'^{nh)~^(Tn = d~^n~~^(Tii^c>Q. 

Then, under conditions given in the beginning of this chapter, m„ w in 
L'(— oo,oo), using the supnorm over compacta metric. Note that if k' is 
bounded and k has compact support, the map 

Di-oc, oo) 3 zis) ^ fizis - u) - zi-u))k\u) du E Z.(-oo, oo) 



is continuous, using the supnorm over compacta. Thus, the continuous map- 
ping theorem implies that 

(43) Vn{s; t) v{s; t) = c J {w{s — u) — w{—u))k'{u) du. 

Define m = Tp^i] (x.„(t)), and note in the following theorem that the rate 
72-2/5 ^]^g independent data case is the same as the rate in the limit 
distribution result for the convex regression; see [23, 24]. 
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Theorem 4. Assume m is convex with m"{tQ) > and to G (0,1). Let 
Xn he the kernel estimate of m defined in (39), with a nonnegative and 
compactly supported kernel k such that k' is bounded, and with bandwidth h 
specified below. Suppose that one of the following conditions holds: 

(i) {si} are independent and identically distributed with Esi = and 
= Var(ej) < oo and we choose h = an~^^^ , where a > is an arbitrary 

constant; 

(ii) Assumption A8 or A9 holds, = Var(X]"=i ^i) and is defined 
in (33), and we choose h = an~^/^ , where a > is an arbitrary constant; 

(iii) Ei = g{S,i) is a long range dependent subordinated Gaussian sequence 
with parameters d and r, and (3 as in (36) and we choose h = l2 (n; a)n~^''^^ i^+rd) ^ 
where a > and n hin; a) is a slowly varying function defined in the proof 
below. 

Then, correspondingly, we obtain 

d~'^{T[Q^{xn)(to) — m{tQ)) ^ ^m"{tQ) J u^k{u) du + c j k'{u)w{—u) du 

+ r(lm"(to)s^ + ^(s;io))(0), 
as oo, where v{s;t) is defined in (43), dn = h and, respectively: 

(i) w = B,c = aa~^/^ , 

(ii) w = B,c = Ka~^^'^ , 

(iii) w = Br^/3,c= \rjr\a. 

Proof, (i) (Independent case) We have cr? = a'^ndn- Thus, d~'^{nh)~^an = 
cj?7,~^/^/i~^/^, and (42) is satisfied with c = oa~^l'^ li dn = h = an~^/^. Prom 
(32), it follows that Assumption Al holds, with w as in (43). 

Define gn as in Assumption A2. Notice that 

gn{s) = h~'^ j l{u)rhn{tQ — hu) du 

= j l{u)m{tQ — hu) du + rn{s), 
with l{v) = k{y + s) — k{v) — sk'{v). Since 

it follows by a Taylor expansion of m around to that the first term converges 
to As'^, since A = m"{tQ)/2. The convergence is uniform with respect to s 
over compact intervals, since the limit function A.s'^ is convex; see [25] and 
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Theorem 10.8 in [41]. For the second term, notice that 



sup |r„(s)| < /i sup / \l{u)\du sup \mn{u) — m{u)\ 

\s\<c \s\<cJ \u-to\<ic+l)h 

= 0(n-i/i-2) = o(l) 

since nh~'^ = {1 + o{l))ch/an 0, because and fi^ oo, which proves 
Assumption A2. 

Assumptions A3 and A4 are satisfied by Proposition 1 and Lemmas 
B.l and B.4 in Appendix B, and Assumption A5 holds by properties of 
Brownian motion. An apphcation of Theorem 1 shows that 

(44) d-\T[o,i]{xn){to)-m{to))^Ti^m\to)s^ + v{s-,to)m 
as n — > CO. Furthermore, 



(45) 



= d^'^Vn{to) = d^'^{nh) ^'^n J k' {u)wn(t — uh) du 
= d^^n^^anh J k' {u)wnhih~^t — u) du 
c k' {u)w{—u) du, 



(46) c^n^(^M(*o) - m^to)) ^m"{to) J u^k{u) du, 

as n — > oo. Since the process Wn in (45) is the same as in the definition of 
Vn, one can make the rescahng in (44) and (45) simultanously to get joint 
convergence of (44) and (45); together with (46), this proves the theorem 
for the independent data case. 

(ii) (Mixing case) The proof is similar to the proof of (i), replacing a by 

K. 

(iii) (Long range dependent data case) We want to choose d„ = /i so that 
(42) is satisfied with c= \r]r\a. Since cr? = r]'^{ndn)'^~^'^li{ndn), we get 

|77,|a = d-\~^\rjr\{ndn)^~'-''/\{ndny^^ 

(47) 

^ d„ = n-'-'^/(4+^'^)/2(n;a), 

where I2 is another function slowly varying at infinity, implicitly defined in 

(47) . We check the assumptions of Theorem 1 similarly as for (i) and (ii). 
□ 

In practice, it can be preferable to normalize the estimator, as in (40). 
It is an interesting problem to study the asymptotics for the normalized 
estimator m; we conjecture the same rate of convergence to hold and note 
that the integrated mean square error is smaller for the corrected estimator. 
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3.2.2. The case dn/h ^ oo as n ^ oo. This subcase is the least inter- 
esting for us, since the hmit processes essentially are white noise processes; 
see [3]. 

3.2.3. The case dn/h ^ as n ^ oo. In this case we can state limit 
distributions for Tjo^i](x„) centered around rE„, the bias term, however, is of 
a larger order and thus the estimator has no useful statistical consequence; 
see [3] for details. 

3.3. Kernel estimation followed by isotonic regression. Suppose {ti,yi) 
are pairs of data satisfying the relation (31). Assuming that m is increasing, 
there is an alternative to doing isotonic regression. Define first 

1 1 

yn = yi, ~ 2n ^ * - 2n' i = 'i-, ■ ■ ■ ,n, 

as the piecewise constant interpolation of {yi}. Similarly, we define rhn 
and in from {m(tj)} and {£i}. Compute the Gasser-Miiller kernel estimate 
(see [19]), 



mn{t)=h j k(^—j^yn{u)du, 



of m and then do isotonic regression on the data (t,?7i„(i))o<t<i- We do 
isotonic regression according to m{t) = T[o^i] (x„)'(t), where 

(48) Xn{t)= j mn{u)du = j K(^—j^yn{u)du 

is the primitive of m„, and K{t) = j^^k{u) du. This is considered in [34], 
where the limit distribution is given for i.i.d. data and for the particular 
choice of bandwidth h = n~^l'^ . In [34] the reverse scheme is treated also, 
that is, isotonic regression followed by smoothing, for i.i.d. data. 

The deterministic and stochastic parts of Xn are defined according to 

XniS) =Xb,n{t)+Vn{t), 



with 



a^fe,n(*) = j K(^—j^^mn{u)du, 
Vn{t)= j K(^—^^en{u)du 



n 



Notice that 



k{u)wn{t — uh) du. 
Xb,n{t) =h'^ J k(^-j-^^mn{u)du 
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is increasing since r7i„ is, and thus Xb^n is convex. Notice further that, for 
the bias of x'j, ^, we have 

^b,ni'to) - 'm{to) = h~^ f k {rhn{u) - m{to)) du 

(49) 



as n — > oo, if we assume that m"{to) exists. 

For the stochastic part, again we get different results depending on the 
asymptotic size oi dn/h. 

3.3.1. The case dn = h. The random part can, for s > 0, be rescaled as 
Vn{s\ t) = d~'^n~^an / {wn{h~^t + s — u) — Wn{h~^t — u))k{u) du 



= dj^ ^n ^afi J [wn{s — u) — Wn{—u)]k{u) du, 

with h = ndn, the right-hand side vahd also for s < 0, and the last equality 
being exact only for t = ti and holding asymptotically otherwise. Assuming 
that 



(50) Cn = d,^'^n ^cTfi— >c>0, 

the integrability of I 
theorem imply that 



the integrability of k, Wn —>■ w on D(— oo,oo) and the continuous mapping 



(51) Vn{s; t) v{s; t) = c j {w{s — u) — w{—u))k{u) du 

on D{—oo, oo) as n — > oo. 

Note that (49) implies that in the following two theorems x'^j^^Iq) can be 
replaced by m^to). 

Theorem 5. Assume m is increasing, mUto) > and to £ (0,1). As- 
sume that {si} are independent and identically distributed with E{£i) = 
and Var(ej) = . Define Xn as in (48), with a nonnegative and compactly 
supported kernel k having a bounded derivative k' , and with bandwidth h 
specified below. Let 'm{t) = T\^Q ^{xn)' {t) . Suppose that one of the following 
conditions holds: 

(i) {ej} are independent and identically distributed with Esi = and 
cr^ = Var(ej) < oo and we choose h = an~^^^ , where a > is an arbitrary 
constant; 

(ii) Assumption A8 or A9 holds, a'^ = Yav{J27=i^i) ^'^'^ defined 
in (33) and we choose h = an~^^^ , where a > is an arbitrary constant; 
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(hi) Ei = g{£,i) is a long range dependent subordinated Gaussian sequence 
with parameters d andr, and (3 as in (36), and we choose h = l2{n;a)n~^'^/^'^~^^'^\ 
where a > and n i— > l2{n] a) is a slowly varying function defined in the proof 
below. 

Then, correspondingly, we obtain 

dn^iMto) - Xb,nito)) ^ {2m'{to))^/^a ■ argmin( + v[^^==;to 

as n— >oo, where, respectively: 

(i) dn = n~^^^, w = B in (51), c = a~^/^a, 

(ii) dn = n~^/'^, w = B in (51), c = a~^/^K, 
(in) dn = l2{n;a)n~^''-/^'^~^^'^\ w = Br,p in (51), c=\rjr\a, 

and with c = a~^/'^a, w = B in (51) . 

Proof, (i) (Independent data case) Since cr? = a'^n = a'^ndn, we get 

d~'^n~^afi = drC'^'^n~^l'^o . Putting dn = h = an~^^^, we thus get c = a~^/^o". 
Let us now verify the theorem from Corohary 1. From (32) and (51), we 
deduce Assumption Al. Notice that 

with Xb^n{i) = jij2'^n{u) du a piecewise hnear approximation of the convex 
function Xb{t) = f^i2'm{u) du. Thus, the rest of the proof of Assumption A2 
is similar to Theorem 4(i), replacing fhn and m in Theorem 4(i) with x^^n and 
Xb^n respectively. Clearly, \xb^n — Xb^n\ = 0{n~^) uniformly on compact sub- 
sets of (0, 1), since the same is true for \mn — m\. Furthermore, Assumptions 
A3 and A4 follow by Proposition 1 and Lemmas B.l and B.4 in Appendix B. 

Since v has stationary increments. Assumption A7 holds and Assump- 
tion A5 is motivated as in previous results. Furthermore, since k' exists, v is 
differentiable and thus. Assumption A6 holds (cf. Note 1). Corollary 1, with 
A = m'(to)/2, now implies the theorem. 

(ii) (Mixing data case) For mixing data, afi ~ K'h}/'^ . The rest of the proof 
is similar to the independent data case. 

(iii) (Long range dependent data) Choose d„ = /i so that (50) is satisfied 
with c„ = \rir\a. Since the variance is o"? = {ndn)^~^'^r]'^.h{ndn) , with li a 
slowly varying function, we get 

|77,|a = d-^n-\ndn)^-'-'^/^\r]rM7idny^^ 

(52) 

^ dn = n-'''/^^+'''h2{n;a), 

where I2 is another function slowly varying at infinity, implicitly defined by 
(52). Thus, with h = d n, we obtain the theorem from Corollary 1. Assump- 
tions A1-A7 are checked as in the last two theorems. □ 
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3.3.2. The case dn/h ^ oo as n — > oo. Rescale the random part as 

Vn{s;t) = d'^n^^cTfi j {wn{s - uh/dn) - Wn{-uh/dn))k{u) du, 

with n = ndn, holding exactly for t = ti and asymptotically for all t. Assume 
that 

(53) d~'^n~^an — > c 

as n — > cx). Write Vn{-]t) = h-nWh, with A„ an operator £)(— oo,oo) — > 
D{—oo,oo). Then AnW Aw = cw as n — > oo whenever t(; G C(— oo,oo). 
Thus, with If the weak limit of {wn}, the extended continuous mapping 
theorem implies 

(54) Vn{s; to) v{s; to) = c{w{s) - w{0)) = cw{s). 

But (53) and (54) are identical to the results for isotonic regression in Sec- 
tion 3.1 and, thus, the limit distributions must be the same. 

For the bandwidths, (53) entails h <C in the independent and weakly 

dependent cases, and h <C l2{n)n~^'^^ ^'^'^^''^^ in the long range dependent case, 
with l2{n) the same slowly varying function as in Theorem 3(iii). 

3.3.3. The case dn/h as oo. We rescale as 

Vn{s;t) = d~'^n~^an J {wn{sdn / h - u) - Wn{-u))k{u) du 

, 1, 1 f , k(sdn/h — u) — k(—u) , 
= d-Hnh)-^an J Wn{u) ^ 1^ ^ -'^^^ 

with h = nh. Assume that (d„n/i)~^(Tfi — > c > 0, and that k is of bounded 
variation. Then, since h/dn{k{sdn/h — u) — k{—u)) — > k'(—u)s for s in com- 
pact sets, we can use the extended continuous mapping theorem to obtain 

Vn{s; t) v{s; t) = c w{u)k' {—u) du - s = cu ■ s, 



where w is the weak limit of {wn}- Here uj £ N{0, J k'^{u) du) for independent 
and weakly dependent data, uj is Gaussian for long range dependent data 
with rank r = 1 and non-Gaussian for r > 1. Note that 

T(s^ + OjJs) = S^ + CLOS, 

T{s'^ + cws)'{Q) = cw. 

When the limit process is r(s^ + u;s)(0) = 0, this implies that we should 
study the rescaling and choice of normalizing constants more carefully, in 
order to get a nontrivial limit. In order to keep things simple, we skip this 
and give proofs only for the regression function. 
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We prove limit results for the properly normalized difference T{xn)'{t) — 
Xb,n{t) only. Note the relation (49) for the asymptotic bias. For independent 
data and with the choice of bandwidth h = an~^^^ , we have (nh)^^"^ ~ h'^ 
and, thus, the asymptotic bias is of the same size as the variance and is 
by (49) equal to ^a^n~^/^m"(to) / u^k{u)du. This is consistent with results 
in [34], where the independent data part of the following theorem was first 
proved for the special case h = an~^^^. 

Theorem 6. Assume m is increasing, with m'{tQ) > and to G (0,1). 
Assume that {si} are independent and identically distributed with E{ei) = 
and Var(ej) =(t^. Define Xn as in (48), with a nonnegative and compactly 
supported kernel of bounded variation and with bandwidth specified below. Let 
m{t) = T[o^i](x„)'(t). Suppose that one of the following conditions holds: 

(i) {£{} are independent and identically distributed with Esi = and 
0"^ = Var(ej) < oo and we choose h » an~^/^; 

(ii) Assumption A8 or A9 holds, = Var(^"^;^ ej) and is defined 
in (33) and we choose h^ an~^/^ ; 

(iii) £i = g{£,i) is a long range dependent subordinated Gaussian sequence 
with parameters d and r, and [) as in (36), and we choose h ^ l2{n)n~^'^/^'^~^'^'^^ 
and nh^ hin) is a slowly varying function defined in the proof below. 

Then, correspondingly, we obtain 

d~^(m(to) - Xb,n(io)) ^ Z 
as oo, where, respectively: 

(i) dn = (n/i)-V2^ Z = N{0, cj2 / k^{u) du), 

(ii) dn = lnh)-^/'^,Z = N{Q,K'^ fk'^{u)du), 

(iii) dn = hinh)^/"^ {nh)~^''^^'^ , Z =\r]r\ J k'{—u)Br^i3{u) du, and with li the 
slowly varying function defined in (35). 

Proof, (i) (Independent data case) Since we have a\ = nhcj'^, we see 
that {dnnh)~^On = d~^{nh)~^^'^a converges to c = o" if we choose dn = (n/i)~^/^. 
Then the condition dn/h — > holds if h^ n~^^^. So for the stochastic part 
of the estimator, we obtain from Theorem 2 

dn^{m{to) - Xb,nito)) au, 

provided all regularity conditions are checked. Assumptions A5 and A6 are 
trivially satisfied, since v{-,tQ) is linear and w \s a. continuous random vari- 
able with a symmetric distribution. To prove Assumption A2, write Xf^n 
in the proof of Theorem 5(i). Then we have 

gn{s) = /i"^ / l{u)xb^n{to " uh) du 
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h ^ I l{u)xb{tQ — uh) du + rn{s), 



with 



satisfying 



liv) 



+ •S"^^ ~ k{v) — s-^k'{v) 



vH{v) dv 



0, A = 0,1, 
s\ A = 2. 



Noting that sup|^|<c |r„(s)| = 0{n~^h~'^) = o{n~^/^) = o(l) as n ^ oo, the 
rest of the proof of Assumption A2 proceeds as in the proof of Theorem 4(i). 
Assumption Al follows from (32) and the extended continuous mapping 
theorem, as noted above. Proposition 1 and Lemmas B.l and C.l imply 
Assumptions A3 and A4, which ends the proof. 

(ii) (Mixing data case) Now we have ~ k?, and the rest of the proof 
proceeds as for independent data. 

(iii) (Long range dependent data case) Here 

{dnnh)-^an = {dnnhy^{nhf~'"^''^\r]r\h{nhf''^ 
= d-^{nh)-"^/^\rir\h{nhYl^ . 

With c = 1 1 5 we obtain 

\r]r\=d-^{nh)-"^/^\r]r\li{nhY/'^ ^ dn = li{nhfl'^{nh)-'"^l'^ . 
Thus, 

h{nh)-^l\nhY'"\m{to)-x,^n{to))^uj\r^r\, 

where w = / k' {—u)Bj.^p{u) du. The condition dn/h ^ is satisfied if we let 
h ^ l2{n)n~'^'^/^'^^'^'^\ where I2 is any of the slowly varying functions l2{-]a) 
defined in the proof of Theorem 5(iii). Assumptions A1-A6 are checked as 
in parts (i) and (ii). □ 



Note that since T{s^ + cws){'d) = (s^ + cws)\s=q = 0, it follows from The- 
orems 6 and 1 that 

/ m^s) ds - Xnito) 
Jo 

_ (^2\ _ ( op{{nh)^^) (independent, weakly dependent data), 
" \op{li{nh){nh)~^'^) (long range dependent data). 
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4. Density and distribution function estimation. The empirical distri- 
bution function is of a fundamental importance in distribution and density 
function estimation. To define this, assume {ti}^_^ is a stationary sequence 
of random variables with a marginal distribution function F. Then the em- 
pirical distribution function is 



1 " 



1=1 

Note that Fn is right continuous with left-hand limits and, thus, Fn lies in 
the space D(— oo,oo). 
Note also that 

Fnit)=F{t)+F^it), 

where 



1 " 



1=1 



is the centered empirical process. Consider a sequence (5„ such that 5„ | 
0,n6n ] oo as n — > oo. Define the centered empirical process locally around 
to on a scale (5„ as 

Wn,5As;to) = a-\n{F^{t^ + s5n) - F^{t^)) 

= '^n,l E(l{*.<to+s5n} - Hu<t,} - Pito + s6n) + F(to)), 
1=1 

where 

als^ = Var(n(FO(to + 5n) - F°(to))) 

= Var|^X^(l|t, -F(to + <5„) + F(to)) . 

We will prove weak convergence Wn,5„ — > w, on oo,oo), as n ^ oo, for 
independent, weakly dependent and subordinated Gaussian long range de- 
pendent data {ti}. 

Theorem 7. Assume {ti} are independent, f{to) = F'{tQ) exists and 
Sn i 0, n5n t oo as n —> oo. Then 



(J. 



n,5„ 



1 



nSnfito) 
and 

Wn,5As;to) B{s) 
on D{ — oo, oo), as n — > oo, where B is a standard Brownian motion. 
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The proof of this theorem is a standard apphcation of the Cramer- Wold 
device and tightness; see the technical report of Anevski and Hossjer [3]. 

For weakly dependent data, we will use mixing conditions. Define the 
(T-algebras 

Tk = (y{ti--i< k}, 
= (j{ti:i > k}. 

Then the sequence {ti} is said to be 0-mixing if Definition 1 in Section 4 is 
applicable, with ti in place of ej. 



Theorem 8. Assume the stationary sequence {ti} is (p-mixing with 
J2^i i(l>^^'^{i) < oo, that 5n — > 0,n5„ — > cxd as n ^ oo, /(to) = P'{to) exists, 
as well as the joint density S2) of (ti, ti+fc) on [to — (5, to + (5]^ for some 

(5 > 0, and k>l. Assume also that we have the hound 

00 

^ Mk < 00, 

k=l 

with Mk = supt^^_^s<si,s2<to+5\fk{si,S2) - f{si)f{s2)\. Then 
(56) - 1 



and 



on D{ — oo, 00), as n ^ 00. 



Wn,5„{s;to) B{s) 



For a proof of this theorem, see the technical report of Anevski and Hossjer 
[3]. 

In the long range dependent case, as in the partial sum process case, we 
make an expansion in Hermite polynomials of the terms in the sum defining 
the empirical distribution function at t G M. In this case, however, the terms 
depend on t, which makes the analysis somewhat different. 

Thus, assume {£,i}i>i is a stationary Gaussian process with mean zero and 
covariance function Cov(A;) = E{^i^i^k) such that Cov(O) = 1 and Cov(A;) = 
k~'^lo{k), where Iq is a function slowly varying at infinity and < < 1 is 
fixed. Let (^iMh^ M be a measurable function and tj = g{S,i). For a fixed t, 
expand the function l{ti<f} — F(t) in Hermite polynomials 

00 -j^ 

hh<t}-F{t)= ^ r]kit)hk{^i). 

k=r(t) 



32 



D. ANEVSKI AND O. HOSSJER 



Here hk is the Hermite polynomial of order k, and 

r]k{t) = ^[(l{t,<t} - F{t))hk{^,)] = J {l{giu)<t} - F{t))hk{u)^{u) du 

are the L^((^)-projections on /i^, and r{t) is the first nonzero coefficient in 
the expansion. Now let t vary and define the Hermite rank of the functions 
{t{g(^.)<t} - F{t):t e M} as r = inftr(i). Assume that < d < 1/r. With a 
slight abuse of notation, we say that the sequence {ti\ is long range depen- 
dent subordinated Gaussian with parameters d and r. 

This implies that the sequence {]l{f.<j}. — -F(t)}i>i exhibits long range 
dependence, and cr^ = Var(X]iLi ^rfe)) is asymptotically proportional to 
v?~'^'^li{n) , with li defined in (35). From Theorem 1.1 in [17], under the 
above assumptions it follows that 

on D[— oo,oo] equipped with the supnorm-metric. The random variable 
is the evaluation (1) of the process defined in Section 4, with (3 as 

in (36). Note that Zr is Gaussian for r = 1 and non-Gaussian for r > 2. Note 
also that the space here is the compact oo,oo] and the metric is the 
supnorm-metric over the whole extended real line. 

Theorem 9. Assume {tj} is a long range dependent subordinated Gaus- 
sian sequence with parameters d and r and < d < 1/r. Define 

Ki = mm{d, 1 — rd)/2, 
K2 = min(2(i, 1 — rd)/2. 

Assume that (5„ — > as n — > c« and, for some e > 0, 

5„>n-''i+^ ifd>l/{l+r), 
5„>n-'^2+£ i/0<d<l/(l+r). 

Then if rjr and F are differentiable at to with r]'j.{to) ^ 0, 
and 

(58) Wn,5„{s;to) ^ s- sgn{r][.{to))zr 

as CO, on £>(— oo, oo). 
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Proof. Write 

1=1 ^' 

with Sn containing the higher-order terms in the Hermite expansion, 

oo -j^ n 
k=r+l ' i=l 

Then 



n 



{F^ito + 6n) - FO(io)) = 6Mto) + o(l)) E ■ 



i=l 

+ Crn{Sn{to + Sn) - ^^(to)) 

as n ^ oo. Thus, to prove (57) it suffices to show that 

YBT{Sn{to + Sn)-Sn{to))=o{5l) 

as n — > OO. With cj^ = Var(X]"=i hr+i{Ci)) we get, for large enough n, 

Var(S„(to + 6n) - Sn{to)) < 2(Var(5„(to + ^n)) + Var(5„(to))) 

~47?,,+i(to)'(a„K)2<n-2«i+- 

proving (57) if d > 1/(1 + r), since then, by assumption, (5„ ^ n~'^i"''^. 
If instead d < 1/(1 + r), we define Sn by 

[r + iy. ^ 

and, thus, 

\Snito + S6n) - Snito)\ 

(59) < ^ (^H^^tF l^^+i + ^'^-) 

- 'nr+l{io)\ + |<5n(t0 + S5n) - 5'„(to)| 

Using the relations (cf. page 997 of [12]), 

(60) inr+iito + Sn) - Vr+iito))^ < (r + l)!(F(to + 5n) " F{to)), 

(61) ^^<cT2n-'^+^ 

Var(5„(t)) < n-2«2+^+^ 
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which hold for large enough n, from (59) we get 
Var(5„(to + 5„) - Sn{to)) 

< {y-)^0[6l + 4(Var(5„(to + <5„)) + Var(5„(to)))] 

as n — > oo, since, by assumption, 5„ ^ n"'^^"'"'^. 
To prove (58), notice that 

+ CnS~^{Sn{to + S(^n.) " 5'„(to)), 

where C„ — > r!/|r/^(to)| , as n ^ oo. Since J^i=i hr{^i) Zr, and 

T/r(i0 + S(^n) -7?r(to) . / , . ^^ 

(>nWr{to)\ 

uniformly on compacts as n ^ oo, (58) will follow if we establish 

(62) SUp6-^\Snito + s6n)-Snito)\^0 

as n — > OO. 

If (i > 1/(1 + r), then from formula (2.2) in [12] we obtain 
6-^\Sn{to + s6n) - Sn{to)\ < 26~^ sup |5„(t)| 

= Op((^-in-'=^+-), 

which proves (62), since 6n ^ n""^"'"^. If < 1/(1 + r), then from (59), (60), 
(61) and formula (2.3) in [12], we have 

S~'\Sn{to + s5n) - Sn{to)\ ~ 5^' n~''/^+^6l/^ + Op(n-«^+^/2+-)) 

as Sn — > 0. Since Sn ^ n^''^^^, implying also 6n » n"'^"^^, (62) holds and 
thus, (58) is proved. □ 

4.1. Estimating an increasing density function. Suppose we have obser- 
vations from an unknown density / lying in the class = {/:(— oo,0] i— > 
[0,oo),/ > 0,/ f{u)du = 1,/ increasing}, and assume we want to estimate 
/ at a fixed point t. In the case of independent data, we can easily write down 
the likelihood and try to maximize this over the class The solution is the 
nonparametric maximum likelihood estimate, and it is known to be given by 
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T(_oo 0] (-^n)' (see [20]), where Fn is the empirical distribution function. In 
the case of independent data, also the limit distribution of T(^-oo,o]{^n) and 
^(-00,0] (-^n)' are known; see [21, 22, 39] and [46]. We will put these results 
into a more general framework. 

The algorithm T^^oo q] (-^n)' pi'oduces an increasing density also in the 
case of dependent data, with marginal /, while of course the likelihood is 
more difficult to work with. Thus, 7'(--cxd,o] (-^n)' is an ad hoc estimator of an 
increasing density in the case of dependent data which lies in J^, and for 
which we will derive the limit distribution. 

Let 

Xn{t)=Fn{t), 
Xb,n{t)=F{t), 
Vn{t)=F^{t). 

Under various dependence assumptions, we have 

Vn{s;to) = CnWn,d„{s;to) ^ w{s) -. v{s;to) 

on Z?(— oo,cx)), as n ^ oo, where {dn} is chosen so that 

Theorem 10 (Independent and mixing data). Assume {ti}i>i is a sta- 
tionary sequence with an an increasing marginal density function f such that 
/'(to) > and to <0. Let Fn{t) be the empirical distribution function and 
fn{t) = ^(-oo,o](-^n)'(^)- Suppose that one of the following conditions holds: 

(i) {ti}i>i is an i.i.d. sequence; 

(ii) {ti}i>i satisfies the assumptions of Theorem 8. 

Then we obtain 

n^/\i{to){fn{to) - f{to)) ^ argmin(s2 + B{s)), 

n'/'c2(to) (^'^ Lis) ds - F(to)) ^ T{s' + S(s))(0) 
as oo, with 

ci{to)=f{to)-'/Hhf'nto)-'/', 

C2{to)=f{t,)-^/\\f'{to)f'\ 

and B a standard two-sided Brownian motion. 
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Proof, (i) (Independent data case) To determine the constants, we use 
Theorem 7, 

^ cn = d~^n"'^ndnfito) = d-J/^n-^/^f{tofl^ ~ 1 
^ dn^f{tof'''n~^/\ 

Since Xh^n = F is convex and x'^to) = /'(to)) Assumption A2 is satisfied with 
A = ^f'{to)- From Theorem 7, it fohows that Assumption Al is satisfied, and 
Proposition 1 and Lemma C.l imply Assumptions A3 and A4. Assumptions 
A5, A6 and A7 hold by properties of Brownian motion (cf. [42] for an LIL for 
Brownian motion implying Assumption A6 by Note 1), so that Corollary 1 
implies 

d-Hfnito) - f{to)) ^ 2VIargminfs2 + 

= 2A^/^ argmin(s2 + B(s)) 

as n ^ oo and, thus, ci(to) has the form stated in the theorem. Then The- 
orem 1 implies 

dn^ /-(^) - i^(io)) ^ T{As'' + i?(5))(0) 

= A^^/'^Tis"^ + B{s)){0) 

as n — > oo and, thus, C2(to) has the form stated in the theorem. 

(ii) (Mixing data case) The proof is completely analoguous to the i.i.d. 
case and uses Theorem 8 instead of Theorem 7. □ 

Theorem 11 (Long range dependent data). Assume {ti}i>i is a long 
range dependent subordinated Gaussian sequence with parameters r = 1 and 
< d < 1/2, and (3 as in (36). Let f be the marginal density function of 
{ti}, and assume f is increasing with f'{to) > and to < 0. Then with Fn 
the empirical distribution function and fn{t) = ^{-oo,o] (-^n)'(t), 

Wi{hT\{nr^'^n''l\Uto) - /(to)) ^ iV(0, 1) 
as n^oo, with li a function slowly varying at infinity, defined as in (35). 

Proof. We have 

c„ = d~^n~^an,d„ ~ d~^n~^anWr{to)\/rl ~ 1 
^ dnr^Wr{to)\h{n)'/^n-'-''/Vr\, 



ORDER RESTRICTED INFERENCE 



37 



where (j„ and li are defined before Tlieorem 9. Note that the assumptions in 
Theorem 9, with 6n = dn, are only satisfied for r=l,0<d<l/2, for which 

case we have Wn,d„ —>■ s ■ zi =: w{s) as n — > oo, with zi a standard Gaussian 
random variable. Theorem 2 implies 

dn'fniiio) - fito)) ^ T{S^ + ZIS)'(O) = ZI 

as n — > oo, implying the theorem. Assumption Al follows from Theorem 9, 
Assumption A2 is established as in Theorem 10(i), Assumptions A3 and A4 
follow from Proposition 1 and Lemma C.l and Assumptions A5 and A6 are 
trivially satisfied; see Note 1. □ 

4.2. Estimating a convex density function. Suppose / : [0, oo) ^ [0, oo) is 
a convex density function, and we want to find an estimator of / at a fixed 
point to > 0. For independent data, it is possible to define the nonparametric 
maximum likelihood estimate. The algorithm for calculating this is quite 
complicated though; see [30]; see also [23, 24] for the limit distribution. We 
present the following alternative estimator: Let 

Xn{t) = n~^h~^ ^ k 

i=l 

be the kernel estimator of the density /, with k a density function sup- 
ported on [—1,1], and h> the bandwidth. Define the (nonnormalized) 
density estimate /n(t) = T{xn){t), and note that fn is convex and positive, 
but does not integrate to one [note that the estimator T[Q^^'j{xn)/In is a 
convex density function, where /„ = J T^Q^^^{xn){s) ds]. We will state the 

limit distributions for fn in the weakly dependent cases; see Section 1 for an 
interpretation. In the long range dependent case the limit process is patho- 
logical [v{s;to) = 0], so that the rate of convergence is faster than indicated 
by our approach. Since a further study of this case is not straightforward, 
we refrain from more work on this; see the remark after Theorem 12. 
We can write 

Xnit) = h^^ J k'{u)Fn{t — hu) du, 
Xb,n{i) = J k'{u)F{t - hu) du, 
Vn{t) = h-^ [ k'{u)F^{t- hu)du. 




4.2.1. The case dn = h. The rescaled process is 

Vn{s;to) = Cn / k' {u){wn,d„{s - u;to) - Wn,d„{-u;to)) du, 
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with Cn = dj^'^{nh) ^an,d„- Choosing dn so that c„ — > c as n — > oo, for some 
constant c, we obtain 

(63) Vn{s;to) ^ c J k' {u){w{s — u) — w{—u)) du =: v{s;to) 

on D{—oo, oo) as n ^ oo, using the continuous mapping theorem as in Sec- 
tion 4.2.1, and with w the weak hmit of {wn}, assuming that k' is bounded 
and since k has compact support. 

Recall the definition of the (nonnormalized) density estimate fn{t) = 
T{xn){t). The rate n~^/^ for the estimator in Theorem 12 is the same as 
in the limit distribution of the NPMLE; see [23, 24]. This is also the optimal 
rate for estimating a convex density from independent observations; see [2]. 

Theorem 12 (Independent and mixing data). Let {ti}i>i be a station- 
ary sequence with a convex marginal density function f such that f"{to) > 
and tQ > 0. Let Xn{t) be the kernel density function above with k a compactly 
supported density such that k' is bounded , h = an"^/^ and a> an arbitrary 
constant. Suppose that one of the following conditions holds: 

(i) {ti}i>i is an i.i.d. sequence; 

(ii) {ti}i>i satisfies the assumptions of Theorem 9. 

Then we obtain 

n2/'(/„(to)-/(to)) 

^a^T{y"{to)s^ + v{s;to)m 

+ ^a"^ J u^k{u) du f" {t^) + ca^ j k'{u)w{—u)du 

as n — > oo , with c = a-^l^f{tQY/^,i{s;t) as in (63) and w a standard two- 
sided Brownian motion. 

Proof, (i) (Independent data case) We have cr^ ~ ndnf{to), so that 
d;;2(n/i)-V„,,„~rf-5/2n-V2j(i^)i/2. 

If dn = an"'^/^, we get c = a~^/^/(to)^^^ and 

(64) n"\fM - f{to)) ^ a^T{y"{to)s^+v{s;to)m 

follows from Theorem 1, provided the conditions in Theorem 1 hold. Notice 
that 

Xb,n{t)=h~^ f k[——\f{u)du 
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is convex, which estabhshes Assumption A2, with A = 2/"(to)) similar to 
the argument in Section 4.2.1. Assumptions Al, A3 and A4 and A5 are also 
verified analogously to the argument in Section 4.2.1. Finally, 

d'"^ (xnito) - Xb,nito)) = d~^n~'^and„ k'{u)wnd„{-u;tQ)du 

(65) 

— > c / k'{u)w{—u)du 



(66) d-2(xb^„(to) - /(to)) ^ 1 / u\{u) duf'ito). 



The joint convergence in (64) and (65) (cf. the proof of Theorem 4), together 
with (66), shows the statement of the theorem for the independent data case, 
(ii) (Mixing data case) Similar to the proof of case (i). □ 

For long range dependent data, as in Section 4.1, we are restricted to the 
case r = 1, < d < 1/2. But now 



v{s]to)=C J k' {u){{s — u)Zr — {—u)Zr) du 

= cszJk'iu)du = 0, 



where the first equality holds since w{s) = s ■ zi. This indicates that the 
rate of convergence is faster than obtained solving Cn = d~^n~^an,d„ = c 
We refrain from further work for this case. 



4.2.2. The cases d^/h oo and dn/h — > oo as n —> oo. We refer the 
interested reader to the technical report of Anevski and Hossjer [3]. 

5. Self similarity and rates of convergence. In many of the examples 
treated in Sections 3 and 4, the stochastic part Vn of x„ is asymptotically 
self similar in the following sense: There exists a sequence a„, | such that 
a~^Vn converges in distribution on a scale bn around to- More precisely, we 
assume the existence of a limit process v(-;to) such that 

(67) a~^Vnito + sbn) v{s;to). 

For local estimators, 6„ | and the convergence in (67) takes place in 
Z)(— oo,oo). For global estimators, we have 6n = 1 and then the convergence 
takes place in D{J — to). Further, assume that v{-;t()) is locally self similar, 
in the sense that, for some /? > and some process v{-;to), 



(68) 



6-^{v{6s-to) - v{0;to)) ^ v{s-to) 
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Table 1 



Convergence 


rates dn for various 


choices 


of an, bn 


and j3 


Theorem 


an 


bn 


/3 


dn 


3(i), (ii), 10(i), (ii 




1 


1/2 




3(iii) 




1 1 


- rd/2 


^-rd/(2+rd) 


6(i), (ii) 




h 


1 




6(iii) 




h 


1 




11 




1 


1 





on Z)(— oo,oo) as (5^0. Suppose that <C and put 5„ = dn/hn- If we 
can interchange hmits between (67) and (68), we obtain 

_ £ _ 

d~'^ {vn{tQ + sdn) -Vn{to)) ^ d~Pan{v{s6n;to) - v{0;to)) 

(69) 

f» d~Pan5^v{s;to). 

Thus, Assumption Al requires [up to a factor 1 + o(l)] 

(70) d-Pan6^ = l ^ d„ = (a„6/)V(p-/3), 

which is then a general formula for choosing As a consequence of this 

T(x„)(to) - Xn{to) = Op«) = Opiianb-f'f/^P-^^), 

r(x„)'(to) - <Jto) = Op(dri) = Op((a„6/)(P-^)/(f-^)). 

For instance, in Theorem 3(i) we have, by Donsker's theorem, o„ = n~^/^, 
6n = 1 and v{s;tQ) = aB{s + tQ),s £ J — to, with B a standard Brownian 
motion. Since the Brownian motion has stationary increments and is self 

similar with /? = 1/2, we can put 'y(s;to) = crB{s) on D{—oo, oo) so that 
d„ = ai^^-"^ = n-V2{i/(P-i/2)) ^ ^-i/{2p-i)^ 

that is, dn = n~^/^ when p = 2. 

Table 1 lists values of an,bn,(3 and dn for all examples with dn ^ bn in 
Sections 3 and 4 when p = 2. For the long range dependent examples, we 
have simplified and put li{n) = 1 for the slowly varying function li in (35). 
For general /i, formula (67) is not valid. We have also ignored constants (not 
depending on n) of dn- 

For Theorem 6, we have replaced (67) with the more general requirement 

(71) a~^ivn{to + sbn) - Vnito)) -^v{s;to). 

Otherwise, a„ will be too large to give the correct value of dn when plugged 
into (70). Notice that the derivation of (69) is still valid, even though we 
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replace (67) with (71). For instance, in Theorem 6(i) we write Vn{t) = 
n~^^'^a X / k{u)B{t — uh) du + op(l) and put a„, = n~^/'^h^^'^ ,hn = h, so that 

a-n^iVnito + Sbn) - Vn{to)) 

= h^'^/'^a J k{u){B{to + sh- uh) - B{to - uh)) du + op(l) 
= a [ k{u){B{s-u) - B{-u))du + op{l). 



In the last step, we have used the fact that B has stationary increments and 
is self similar. Thus, from (71) we obtain 



v{s;tQ) = a J B{u){k{s — u) — k{—u)) du, 

from which follows (3 = 1 and 

-/ N f ^, ^k(s6 — u) — k(—u) , 

v(s;to) = hm as / B(u)— '- ^ -du 

s^o J OS 

= as J B{u)k' {—u) du=:asZ, 
where Z e N{0, J k'^{u) du). 

6. Concluding remarks. Several of the applications in Sections 3 and 4 
are not stated in the most general form, because of a desire to keep the paper 
self-contained. Therefore, we would like to point out some generalizations 
that can be made. Section 3 on regression could be made more inclusive by 
allowing for heteroscedasticity and nonequidistant design points, as consid- 
ered by Wright [47]; see also [33] and [34]. 

Furthermore, we have not made an extensive study of all possible mixing 
conditions, and whether, for instance, in these cases the bounds derived 
in Appendices B and C apply; neither have we tried to apply our results 
to short range dependent subordinated Gaussian data as defined in [15]. 
Long range dependent data limit results under exponential subordination 
are derived in [18]; we have not tried to apply our results to this case. 

In Section 4 it is possible to prove results for estimators of a monotone 
density and of its derivative, by isotonization of a kernel density estimate; 
the calculations are similar to Section 3.3. 

It is also possible to use Theorems 1 and 3 for p^2, which would gener- 
alize the theorems in Sections 3 and 4. Since existing results, as Wright [47] 
and Leurgans [33], deal with independent data, this would constitute new 
results for weakly dependent and long range dependent data. 

Unimodal density estimation, with known or unknown mode, is related to 
density estimation under monotonicity assumptions, and the distributional 
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limit results are identical to ours when using the respective NPMLE. We 
conjecture that the results for long range dependent data in Section 4 hold 
also for unimodal densities. 

The case p= I, corresponding to nondifferentiable target functions, is not 
covered in this paper. It is treated in [4]; the approach is somewhat different 
and the limit distributions are different from the ones obtained in this paper. 

It would be interesting to extend our results to process results, such as 
in, for example, [5, 28, 31]; we have not attempted to do so in this paper. 

Alongside regression and density estimation, a third topic which can be 
treated with arguments similar to this paper is the monotone deconvolution 
problem; see [45]. In fact, in [1], the asymptotic theory of Section 2 is applied 
to the monotone deconvolution problem. 

APPENDIX A: PROOFS OF RESULTS IN SECTION 2 
In this appendix we prove the statements in Section 2. 

Proof of Proposition 1. We first show that if Assumption A2 holds, 
then gn can be bounded in the following manner: for any constant k > 0, 
there is a r < oo , such that 

(72) liminf inf (g„(s) - kIsI) > 0. 

n^oo |<j|>T 

To prove this, from (12) it follows that, given any r > and e such that 
0<e<ATP/2, 

gni±T)>ATP-e, 

if n > no for some finite no = no(e). Since gn{0) = and gn is convex, it 
follows that 

gn{s)>{ATP-e)\s\/T>'^ATP-'\s\, 

when \s\ > r, for all n>nQ. Thus, (72) holds with k = ^AtP~^. Since r can 
be chosen arbitrarily large, so can k. 

We are now ready to establish Assumptions A3 and A4. Choose r = t{6, e) 
as in (14). Then (14) and (72) imply 

liminf P( inf (yJs) - k(1 - e)\s\) > ) > 1 - 5, 

and this proves Assumption A3 [with k(1 — e) in place of k]. To establish 
Assumption A4, we notice that the convexity of gn and gn{0) = imply that 
gn{s)/s is increasing on M"*". With r = T{6,e) as above, we get 

inf ^<(l + e) inf ^ = (l + e)^ 

T<S<C S T<S<C S T 
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and, similarly, 



inf^>(l-e)inf^>(l-e)^>(l-e)^. 

s>c S s>c S C T 

Choose M > so that sup,„ |(7„(t)| < M. Then 

n Vn(s) ^Vnis) QniT) 2eM 

inf - inf < 2e^^^^ < . 

T<s<c S s>c S T T 

Since e can be made arbitrarily small, the first part of Assumption A4 is 
proved. The second part is proved in the same way. □ 

Proof of Proposition 2. Define h{z) as in (17). Then since r(z)'(0) = 
T(2:)'(0— ) if and only if /i is a functional continuous at z, the set of discon- 
tinuities of h is 

D/, = {z:r,(z)'(0)>r,(z)'(0-)} 

[recall that Tc(z)' denotes the right-hand derivative of Tc{z)]. Let 

r2c(a, e) = {z: z{s) — z{0) — as> e\s\ for all s £ [— c, c]}. 

Then C Ui^i ^c(ai5 £«) for some countable sequence {(aj,ej)}. Thus, 

(18) implies P{Dh) = 0, and Assumptions Al and A2 imply — > y. By 
assumption D'f^ is separable and completely regular and, thus, the continu- 

ous mapping theorem (cf. Theorem 4.12 in [38]) implies h{yn) h{y). □ 

We will next go through a sequence of results that were used in the proofs 
of Theorems 1 and 2 and Corollary 1. We start by stating some elementary 
properties of the functionals. A point t such that T{y)(t) = Tc{y){t) we call 
a point of touch of T{y) and Tc{y). 

Lemma A.l. Assume y £ D(M.). If T{y) and Tc{y) have no points of 
touch on the interval I C M, then T(y) is linear on I . If A C B are finite 
subsets of M and y is bounded from above by M on A, and bounded from 
below by the same M on B'^, then 

(73) ■uii\y{s)-T{y){s)\=d. 

For any interval O of M, and functions l,h on O such that I is linear and 
constant a we have 

(74) Toih + l)=Toih)+l, To{ah) = aTo{h) , 
T is monotone, that is, 

(75) yi<y2 =^ T{yi)<T{y2). 
If r is another function on O, 

(76) sup [To (r + /i) (t)- To (/i)(t) I < sup |r(t) I . 
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Proof. Assume T{y) and Tc{y) have no point of touch on /. Then since 
Tc{y) is a minorant of y, T{y) and y also have no point of touch on / and, 
thus, T{y) is hnear on I. 

To prove (73), suppose 

ml\y{s)-T{y){s)\=e>0. 

Then T{y) is a straight hne / on B, and further, T{y) < M — e on A. Assume 
w.l.o.g. that /' > and let b be the left end point of B. Then l{s) < M — e 
on (—00,6] and, thus, 

m[{y{s)-l{s))>e>0, 

s<b 

which is impossible by the construction of T{y). 

Equations (74) and (75) are immediate, and (76) follows from these two, 
since for arbitrary t€0, 

Toih{t))-sup\ris)\=To(h{t)-sup\r{s)\) 
seo \ seo / 

<To{h{t)+r{t)) 
< To (h{t)+ sup \r{s) 

\ s£0 

= To{h{t)) + sup\r{s)\. 
We next state a local limit distribution theorem. 



□ 



Lemma A. 2. Let to £ J be fixed, and assume Assumptions Al and A2 
hold. Then 

d-P[T,,n{Xn){to)-Xn{to)]^T,[A\s\P + v{s;to)m 

as n — > 00, and with A> as in Assumption A2. 

Proof. A t varying in [to — cdn, to + cdn] can be written as t = to + sd^ 
with s £ [— c, c]. Using the representation (23) and (74), we have 

dn^[Tc,n{Xn)ito) - X„,(to)] = Tc[gnis) + Vn{s;to)]iO) . 

From Assumption A2, and the fact that Vn ^ v on D[—c,c], we get that 

9nis) + Vrt{s;tQ) — > A\s\P + ?)(s;to) on D[—c,c]. Equations (74) and the fact 
that T is a continuous map from D[—c,c\ to C[— c,c] [i.e., (76)] imply by 
the continuous mapping theorem the statement of the theorem. □ 

Next WG show tha^t th.6 difFcrGncG bctweGii ttiG loccilized. functiona,! 
and the global T goes to zero as c grows to infinity. 
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Let us consider a sequence of stochastic processes {yn}n>i in D{—oo, oo), 
for which we wih state a truncation result. First we need the following ad- 
ditional assumption. 

Assumption A.l (Compact boundedness). For every compact set K 
and (5 > 0, there is a finite M = M{K, 6) such that 

limsupPi sup |?/n(s)| > M I < 6. 

Theorem A.l. Assume Un satisfies Assumptions A3, A4 and A.l. Then 
for every finite interval I in M and e > 0, 

lim limsuppfsup|r,(y„)'(-)-r(y„)'(-)| >£) =0, 

c^oo n^oc \ I / 

lim limsuppfsup|r,(y„)(-)-T(y„)(-)| >£) =0. 

c^oo n^oo \ I J 

Proof. Let 5 > be arbitrary and put K = [—1, 1]. Define the sets 
A(n, r, M, k) = I sup < M I n | inf (y„(s) - k\s\) > 1. 

If / is an arbitrary interval of [— c, c], define the sets 

B{n,c,I,e) = |sup|Te(y„)'(t) -r(y„)'(t)| < e|. 
From Lemma A. 4, it follows that 

(77) B{n, c, {-r}, e) n B{n, c, {r}, e) C B{n, c, I, e) 

for any / C [— r, r] , if r < c. 

We will show that, given any (5 > 0, if c is large enough, 

(78) limsupP(P(n,c,{r},e)'=n A(n,r,M,K;)) < 6, 

n—foo 

(79) \imsupP{B{n,c,{-T},£ynA{n,T,M,K)) < 6. 

n— >oo 

Combining Assumptions A3 and A.l, we find that if M and r are large 
enough and k > is small enough, then 

lim sup P(A(n,r,M, nf) <6. 

n—>oo 

Using (77), this will imply that, with a c large enough and for all large 
enough n, 

P{B{n, c, I, eY) < P{A{n, r, M, kY) 
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+ P{B{n, c,{-T},ey n A{n, r, M, k)) 
+ P{B{n, c, {t}, ey n A{n, r, M, k)) 
<2d + 6 + d, 

and since 5 > is arbitrary, the first assertion of the theorem wih fohow. 

Without loss of generality, we assume that r is chosen so large that r > 
M/k. Then, given A{n,s,M), we have, by our choice of r, 

inf 7/„(s) > M > supynis). 

\s\>T S&K 

Let C("; c, re, r) be the tangent line of Tc(y„)(s) at s = r, with slope Tc{yny {t+) 
Then exactly one of the following three events can take place. If c > r, for 
all large enough n, we have the following: 

1. Cis;c,n,T) < yn{s) for all s ^ [-c,c]; 
2 rfsT 77 tJ>^"(^)' for some s>c, 
' ' ' n<yn(^), foralls<-c; 

> yn{s), for some s < —c, 
< y-nis), for all s > c. 

In the case 1, TdynYir) = T{yn)'{T) if c> r. 

From the assumptions defining case 2, we get, if c > r and j4(n,r, M, k) 
holds, 

inf ^"(^)-^(^»)(^) < T(y„)'(r) < Uyn)\r) 

s>c S — T 

< inf ~^c(2/n)(r) 



3. C(s;c,n,r)| 



(80) 



T<s<c S — T 

yn(s)-r(2/„)(r) 



< inf 



< 



T<s<c S — T 

yn(2r)-r(2/„)(r) 



where the last inequality holds if c> 2r. Assume that sup|5|<2T- lynCs)] < M, 
with M chosen so large that this event has probability larger then 1 — 6/4, 
for all large enough n. Then the right-hand side of (80) is bounded by 2M/t. 
Thus, T{yn)'{T) = Tc{yn)'{T), unless 

(81) ,^,yn{s)-T[y^){r)^2M^ 

s>c S — T T 

But if (81) holds, we get from the first half of Assumption A4 (with e/3 in 
place of e) that, with probability > 1 — (5/4, 

inf < inf + - < 2 inf + - 

f<s<c S s>c S 3 C s>c S — T 3 
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~ C \s>c S — T C — T J 3 

c-Tf2M M \ e 6M e 
<2 + +-< + -, 

C \ T C — T J 6 T 6 

for c large enough. Using Assumption A4 again with e/3 in place of e and 
for some large f > r, this implies 

|r,(yO'(r)-T(y„)'(r)| 

^ . yn{s) -Tc{yn)(j) _ .^^ yn{s) - T{yn){T) 

~ f<s<c S — T s>f S — T 

2M . ^ yJs) , yJs) 
< + inf - inf 

T — T f <s<c S — T s>f S — T 

<^+( inf ^_inf^Uf inf inf ^"^^^ 



T — T \T<s<c S s>f S / \T<s<c S — T f<s<c S 

<^+f inf ^-inf^Uf^-l) inf ^ 

f — T \f<s<c S s>f S J \f — T J T<s<c S 

2M e T f6M e\ 

T — T 6 T — T \ T 6 J 

provided f is chosen large enough and then c is chosen so large that Assump- 
tion A4 holds. Thus, given ^(n,r, M, k) and case 2, S(n,c, {r},e) holds un- 
less sup|5|<2^ |yn(s)| > M or if the first half of Assumption A4 fails, which is 
an event with probability at most (5/4-|-(5/4 = (5/2. 

Given A{n^ r, M, k) and case 3, a similar argument implies that -B(n, c, {r}, e) 
fails with probability at most 5/2. Combining cases 1-3, we deduce (78), 
and (79) is proved in the same way. 

To show the second part of the theorem, for / an arbitrary interval of 
(— c, c) containing [— r, r], define the sets 

C(n,c,/,e) = |sup|Te(y„)(s) - r(y„)(s)| < e|. 

Let L = length(/). Suppose that Assumptions A3 and A.l hold (an event 
with probability > 1 — 25). We will apply (73) with A = K as in Assump- 
tion A.l and B = [— r, r] . Assume also r > M/k, with M, r, k as in Assump- 
tions A3 and A.l. Then there is an r/ G [— t, t] such that |yn(??) ~T{yn){v)\ ^ 
e/2. Since r(y„)(r/) <Tc{yn)iv) < 2/n(f/), we get \Tc{yn){r]) -T{yn){7])\ <e/2. 
Thus, 

P{C{n, c, /, e)) < P{B{n, c, I,e/2L)) + 2S, 
and the second part of the theorem follows from the first. □ 
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Note A.l. If T is replaced with the greatest convex minorant Tq on an 
interval O on M, Theorem A.l trivially still holds. If T is replaced with To„ 
where On is a sequence of intervals such that On T l^j the theorem still holds. 
In the latter case Assumptions A3, A4 and A.l may be relaxed somewhat; 
all suprema and infima with respect to s over M can be relaxed to suprema 
and infima with respect to s over 0„. 



Lemma A. 3. Assume Assumptions Al, A2, A3 and A4 hold. Define 
^n.A = [^0 ~ ^dn,t{) + Afi„] . Then for every finite A, and every e > 0, 



lim liminfP 

c^oo n— ►oo 



sup dn^\Ta,n{Xn){-) - Tj{Xn){-)\ < E 



1, 



lim liminfP 

c— »oo n— >oo 



sup ti-f+l|Tc,n(x„,)'(-) - Tj{Xny{-)\ < £ 



Proof. From (74) and (23), it follows that 

sup d-P\Tc,n{xn){-) - Tj{xn)i-)\ = sup |Tc(y„)(-) - Tj (y„)(-)| 
A„,A [-A,A] 

and 

sup d-^'+i|r,,„(x„)'(.) - Tjixnn-)\ = sup |r,(2/„)'(.) - rj„,^ (y„)'(.)|, 

with 2/„ as defined in (13). 

If J = M, we use Theorem A.l with / = [—A, A], and if J 7^ M, we use 
Note A.l with 0„ = Jn,to- 

Assumptions A3 and A4 are satisfied because of Proposition 1, and As- 
sumption A.l is implied by Assumption Al and (12). Thus, all the regularity 
conditions of Theorem A.l are satisfied, and the lemma follows. □ 

Lemma A. 4. Suppose y G D{R). Then the function t^Tc{yy{t)-T{y)'{t) 
is increasing on [— c, c]. 



Proof. Let {Jk} be a sequence of open intervals in (— c, c) such that 
their union covers (— c, c). Without loss of generality, we can assume that 
each Jk either contains no points of touch of T{y) and Tc{y) or it contains 
exactly one simply connected set Qj^ of points of touch (so then consists 
of either a simple point or it is an interval). If Oj^ is empty, by the first 
part of Lemma A.l, T{y) is linear on J^, and since Tc{y) is convex, the 
assertion follows. If Oj^. is nonempty, since Tc{y) > T{y), for all t G [— c, c], 
the assertion holds again, and the lemma is proven. □ 
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Table 2 
Rescaled processes 



Section 


Vn{s;t) 




n 


3.1 


CnWf,{s) 


d~^{nh)~^af. 


ndn 


3.2.1 


Cn J {uin{s — u) — ttJfi ( — It) ) fc' ( — u) du 


d~'^{nh)~^af. 


nh 


3.3.1 


Cn J {Wn{s — It) — Wii{ — u))k{u) du 


d~'^(nh)~^(Jn 


nh 


3.3.2 


Cn / {wfi{—uh/dn + s) — Wft{—uh/d„)k{u))du 


d~^n~^af. 


ndn 


3.3.3 




{d„nhy^an 


nh 



APPENDIX B: BOUND ON DRIFT OF PROCESS PART: PARTIAL 

SUMS 

In this appendix we will establish Proposition 1 for the various applica- 
tions of Section 3. By Proposition 1, Assumptions A3 and A4 are implied 
by Assumption A2 and the following: 

Assumption B.l. Assume that for any e,5 > 0, there exist k = K{e,d) > 
and T = r(e, 5) > such that 

/ Vnis;t) \ 
sup F I sup j— j — > e I < 0. 

In all the cases of Section 3, the rescaled process Vn can be written as a 
function of the partial sum process Wn] see Table 2. (Note that in some of 
the cases in Section 3, Vn is a function of Wn instead of Wn- However, since 
Wn is a smoothed version of Wn, the bounds on Wn established in this section 
are easily shown to translate to bounds on w;^.) 

In all the above cases we have c^— >c>0 as n— >oo. Therefore, we start 
by establishing the following: 

Lemma B.l. Suppose {£i}i>i is a stationary independent process, a 
weakly dependent sequence satisfying Assumption A8 or A9 or a long range 
dependent subordinated Gaussian sequence with parameters d and r, and (3 
as in (36), and assume that E{ei) = and Var(ei) = cr^ < cxo. Then for each 
e,5,K> 0, there exist r = r(e, 5,k) > and ttiq = mo(e, 5,k) < oo such that 

(82) sup Pf sup^^^>e) <5. 

Observe that, with r < oo, e > and n fixed, we have 
r,/ \wh{s)\ \ 

P sup — — > £ 

\\s\>r k\s\ ) 
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<2P( i 



(83) 



\Wnis)\ 

sup — j— j— > e 

\ S>T 



<2j2P(\Wnis,)\>- 



KSi 



Si>T 

+ 2^P( SUp^ \Wn{s) -Wn{Si-l)\>^KSi^l 



Si>T 

where • • • < s_i < < si < • • • is a partition of M. Note that we used the fact 
that s 1-^ K,\s\ is increasing for s > in the second inequahty in (83). 

Lemma B.2. Under the assumptions in Lemma B.l, 

/ e \ CA^^ 

pi sup \Wn{s) - Wn{Si)\ > -K\Si\] < ^ \ , 

\si<s<Si+i ^ / e 

where Aj = Sj_|_i — Sj, and [i = 1/2 in the independent and weakly dependent 
cases and 1/2 < /3 = 1 — rd/2 <1 in the long range dependent case. 

Proof. Let Sk = J2i=i ^i, n = nAj and assume that Si < s < Sj+i. Then 
by stationarity, 

Wn{s) - Wn{Si) = Wfi{s - Si), 

at least when Sj = d~^{ti — tQ) for some observation point tj, which we assume 
w.l.o.g. Thus, 

P{ sup \Wn{s) - Wn{Si+l)\>^K\Si\] = P\ ITLcQ^ \Sk\> XorA, 
\s,<s<Si+i ^ J \\<k<n J 

with a\ = Var(S'fi) and 

> _ e <7n , I 
2 an 

For independent data, equation (10.7) in [8] imphes 
(84) P f max \Sk\>Xaf) 

.1/2 



Vl<A:<n / 



Since an /am = A-' for independent data, this proves the lemma in this 
case. 

In the weakly dependent mixing case, we use the resuhs of McLeish [36] 
to prove (84). Thus, denoting \\z\\q = {E\z\'^y^'^ for a random variable z, we 
call the sequence {si} a mixingale if 

(85) \\Sn- E{en\:Fn+n)h<1ph+lCn, 

(86) \\E{en\J^n~h)\\2<tphCn, 
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with Cn,tph finite and nonnegative constants and liuin^ooipn = 0. Since each 
e„ is ^^-measurable, (85) holds. Assuming that has finite variance, or in 
the case of a-mixing, finite fourth moment, Lemma 2.1 in [36] imphes 

||S(e„|.F„_ft)||2<20(n)i/2||e„||2, 

\\E{en\J'n-n)\\2 < 2{V2 + l)a{n)^/%n\U. 

Using Assumption A8 or A9, we will apply Theorem 1.6 in [36] with either 
il^n = 2(/<(n)^/2 and Cn = W^nh = or V'n = 2(V2 + l)a{n)^/^ and c„ = ||en||4- 
In either case X^^i V'^"^ < °o for some e > 0, which shows that ip{n) is 
of size —1/2 with the McLeish [36] terminology (as noted on top of page 
831 in [36]). Notice also that Ef=i c- = Cn for some C > and cr? ~ K^n 
according to (33). Thus, Theorem 6.1 in [36] and Chebyshev's inequality 

1/2 

imply (84). Notice that cr^i/cr^i ~ A. as n, n — > oo. 

In the long range dependent case we use Theorem 12.2 in [8]. Thus, 

£;(5?)~r?2/i(n)n2/3, 

with li as in (35), and according to de Haan [16], equation (12.42) in [8] is 
satisfied, with 

7 = 2, 
Q = 2/3, 

7Xz = (Cir?2/i(n))V2/3^ 

for some constant Ci > 0. Theorem 12.2 in [8] then implies that 

/ \ K' / C 

(87) P{ nmx\Sk\>Xan)< ' ''^ ^ 



Vl<fc<n' ^' J~{X(7ny\^^ J A2' 

with C = C1K2 2p- From de Haan [16] it follows that 

for all large enough h. Substituting for A in (87) completes the proof. □ 
Proof of Lemma B.l. From (83) and Lemma B.2, we have 

^Pl^^ I I ' Si>T I SiJ^l>T I 

In order to take care of the slowly varying factor in Ew\{si) for long range 
dependent data, we write 

Ewl{s,)<Csf\ 
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where /?' = 1/2 in the independent and weakly dependent case and /?</?'< 1 
in the long range dependent case (this is no restriction if we assume Aj > 
(5 > for all i and some constant S). Replacing Af^ with A^^ , we want to 
examine whether the sums 

2/3' 

Si>T i Si>T 



E 



Si + 1>T 

tend to zero as r ^ oo. Clearly, choosing Aj = (5, the first sum is divergent. 
Instead we let si = with p> \. Thus, the sums are of the order 

Y ^p(2/3'-2) ^ (^^l/p^2p(/3'-l)+l^ 

Y i2(p-l)l3'-2p ^ (^^l/p-j2(p~l)/3'-2p+l_ 

i>rl/p_l 

The demand that both expressions should converge to zero as r ^ c« implies 
that 

-2(1-/3') + - <0, 
P 

-2(1 -/?')- < 0, 
P 

which shows that we should choose p > 1/2(1 — and this completes the 
proof. □ 

Lemma B.l immediately proves that Vn{-',t) in Section 3.1 satisfies As- 
sumption B.l. 

Lemma B.3. Assume {wn} satisfies (82) in Lemma B.l and is uni- 
formly bounded in probability on compact intervals. Let be a sequence 
of functions with supp(Z„) C [—K, K] for some K > and all n, and with 
supn J \ ln{u) \ du < oo . Then 



Vn{s) = J {Wn{s -u)- Wn{-u))ln{u) du 

satisfies Assumption B.l. 
Proof. Since 

\Vn{s)\<\ sup \Wm{s-u)\+ SUp \Wm{-u)\] \ln{u)\ du, 
\u[-K,K] ue[-K,K] J J 
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we obtain 



\^n{s)\ „( \Wm{s-u)\ 

sup — j— — < C sup sup 



|s|>r \|s|>Twe[-i^,-ft:] 

\Wm{-u)\ 
+ sup sup j—j 

^ r^l \Wyn{s-u)\ \Wm{-u)\ 

< C sup sup ■ ■ h G sup j—j — 

\s\>Tue[-K,K] k\s-u\ u&[-K,K] I^IT] 

= C sup j—j h C sup j—j — , 

\s\>T~K f^l^l u£[-K,K] 

with C = sup„ / |/n.(^^)| du and C = C sup|5|>^sup„g[_^ ^^ | ~"^ . Finally, (82) 
and the fact that {wm} is uniformly bounded in probability on compact in- 
tervals finish the proof. □ 



Applying Lemma B.3 with /„(u) equal to Cnk' {—u) , Cnk{u) and Cnk{dnu/h) x 
dn/h, respectively, establishes Assumption B.l in Sections 3.2.1, 3.3.1 and 3.3.2. 



Lemma B.4. Assume {wn} satisfies (82) in Lemma B.l. Let I be a 
function of bounded variation with support in [—1, 1], and assume {pn} is a 
sequence of numbers such that lim„_^ooPn = 0. Then 

_ Wm* l{spn) - Wm* l{0) 

Vn{s) = 

Pn 

f , .lispn + u) - l{u) 

= / Wmi—U) du 

J Pn 

satisfies Assumption B.l. 



Proof. We will give different bounds on Vnis) for small and large values 
of Assume that \s\ < (r + l)p~^, where r > is a constant that will be 
chosen below. Then 

I . M ^ I I I . M /■ \l{spn-u)-l{-u)\ 

\Vn(S)\<\s\ sup \Wm{U)\ / j— j du 

\u\<T+2 J \s\Pn 

<\s\ sup |t(;m(^^)| / \l'{u)\du. 

|m|<t+2 j 

If instead |s| > (r + l)p.^^, choose arbitrary e,(5, > 0. Then Lemma B.2 
implies the existence of r = T(e, J, kq) > such that, with probability larger 
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than 1 — 5, we have 



\Vn{s)\ < Pn^(^ J \Wm{u)l {-u)\du + e Ko j \u\\l{spn -u)\du^ 

< / \l{u)\ du ( sup \Wm{u)\ + eKo{\s\pn + 1) 

J V|u|<l J 



< \s\ I \l{u)\du — — — he^o 1 + 



Thus, with probabihty larger than 1 — 6, 

bn(s)| 
sup — j- 



<max( sup \wm{u)\ / \l'{u)\du, 
\|m|<t+2 J 

f \U f ^^V\u\<l\Wm{u)\ T + 2\\ 

j I^^^^I^H TTi + ""°7TTjj' 



Since we assume that Wm is bounded on compacta uniformly over m, with 
probability larger than 1 — 2(5, the right-hand side is bounded from above 
by a constant C = C{e, 5, kq) > 0. Pick k = kqC/e. Then 

\Vnis)\ ^ e Vnis) 

sup - — — — < — sup — -— < e, 
with a probability larger than 1 — 25. □ 



Applying Lemma B.4 with pn = dn/h and l{n) equal to k[u) establishes 
Assumption B.l in Section 3.3.3. 



APPENDIX C: BOUND ON DRIFT OF PROCESS PART: EMPIRICAL 

DISTRIBUTIONS 

In this appendix we will establish Assumption B.l for the various applica- 
tions treated in Section 4. The processes Vn{s]to) are functions of i«n,5„(s) := 
'"^n,(5„(s;io) for ^-ll cases treated in Section 4, as seen in Table 3. 

In all the above cases we have c„— >c>0 as n— >oo. 



Table 3 
Rescaled processes 



Section 


Vn{s;t) c„ 


5n 


4.1 
4.2.1 


c„ / (wn,i„(s - u) - w„,s„{—u))k'{-u) du d~'^{nh)~^a„.s„ 


d„ 
h 
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Lemma C.l. Assume {tj} is a stationary sequence with marginal dis- 
tribution F , such that f{tQ) exists, and Sn I 0,n(5„ t oo as oo. Then, if 
{ti} is an independent or (j)-mixing sequence with X^i^i < co, there 

exists for each £,6,k> a t = T{e,S, k) > such that 

(88) p(snv^-^^^^^MP^>e]<6, 

\\s\>T K-\S\ J 

for all large enough n. If {ti} is a long range dependent subordinated Gaus- 
sian sequence and satisfies the assumptions of Theorem 9, then for each 
£,6 > 0, there exist k = k{£, 5) > and r = r(e, 5) > such that (88) holds. 

Proof. We start by proving the lemma for long range dependent data. 
Then 

/ . N Tlr{t0 + S5n) -r]r{to) ... -1 A (c\ 

Wn,5MM = . I ^| V- + o{l))(Tn Z^^r-(^i) 

"n\ilrV^O)\ ^^-^ 

+ CnS'-^iSnito + s5n) - 5'„(to)), 

where C„ — > r\/\rjr{to)\ as n oo. Clearly, ||??r||oo = sup^ |^r(i)| < oo. More- 
over, since f?r(to) 7^ 0, there exists a 5 > such that \r]r{to + s) — r]r{tQ)\/\s\ < 
2|^r(^o)| whenever \s\ <6. Thus, 

\ririto + s6^)-Vr{to)\^f 2U.U I 



Wit0)\ V ~6\v'r{to)\ 

Further, since 



i=l 

as n — > oo, and since from [12], 

sup(5~^|5„(to + s6n) - Sn{to)\ ^' 

s 

as n ^ oo, the result follows. 

In the independent and weakly dependent data case, we consider w.l.o.g. 
the supremum for s > r only. Analogously to (83), we have 



\s>T k\s\ ) 



Si>T ^ ^ 

£ 



i>r 



+ ^ P( sup \Wn,S„is) - Wn,5„{Si-l)\> l^l^Si-ij, 



Si-l<S<S 
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where < si < S2 < • • • is an increasing sequence. Assume first that F ~ 
C/(0, 1). To proceed further, we need the following lemma, proved in [3]. 

Lemma C.2. Suppose {ti} is an independent or weakly dependent se- 
quence of random variables, satisfying the assumptions of Lemma C.l. Then 

for all A > if Aj = Sj+i — Sj > 1, with K a constant depending only 
on {4)n}- 

The next part of the proof proceeds similarly to the proof of Lemma B.2, 
so we highlight only the differences. Let gs be defined as in the proof of 
Lemma B.l. Then from [8], page 172, we get 

E{wlsJ < %l (l + <A1^') nE{gs{t,) - go{ti)f 

<2(^l+4Y,^^y^^s=:Cs. 

By Lemma C.2 and Chebyshev's inequality, the lemma is proved for F ~ 
C/(0, 1) if we can prove that the sums 

E E ^ and E ^ 

z — z — 

tend to zero as r — > oo . But this is true if Si = i^ for any p> 1. 

Consider again a general F with f{to) > 0. Let w^^g^ and c^^s^ be the 
quantities corresponding to Wn,5„ and crn,s„ when F ~ C/(0, 1). Then 

Wn,s„is;to)=wl^. {s;F{to)), 



where 5„ = F{to + 5„) - F{to) and s = {F{to + s5n) - F(to))/(F(to + 5„) - 
F{to)). Choose 6>0 such that /(io)/2 < \F{to + s6n) - F{to)\/\s\ < 2f{to) 
if |s| < 6. Then, since < F{tQ + s6n) — -^(^o) ^ Ij it follows that 

s f 2 

(90) sup - < max 4, 



s^o s V 5f{to 

for all n so large that 5„ < 6. Now (89), (90) and the proof of (88) when 
F ~ U{0, 1) finish the proof of (88) for general F. □ 

To establish Assumption B.l for the various choices of Vn{-',tQ) in the 
table in this appendix, we proceed as in Appendix B, making use of Lemmas 
B.3 and C.l. 
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